This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Data Structures

Data structures are ways of organizing and storing information in a computer program so that it can be accessed and modified efficiently. As programmers, it is essential to understand the different data structures available and know when to apply each one to optimize the performance and utility of our programs.

A data structure is a particular way of organizing data in the computer’s memory so that it can be used efficiently. Data structures come in many forms, such as arrays, lists, stacks, queues, graphs, trees, hashes, etc.

Each structure organizes the data according to a specific logical model and supports efficient operations to access, modify, add, and delete elements according to that model. For example, an array organizes elements sequentially in memory to facilitate random access by indices. A linked list connects elements in memory using “nodes” with references to the next node to facilitate insertion and deletion.

By choosing the appropriate data structure for the task to be solved, we can write more efficient programs and reduce computational complexity, using fewer resources like memory and processing.

Data structures help us:

  • Organize large amounts of data to make them easier to access and modify.

  • Model complex relationships between data, such as with graphs and trees.

  • Access and modify data efficiently, optimizing performance.

  • Reuse existing code and data structures instead of having to rewrite solutions from scratch.

For example, a program that needs to store thousands of user records benefits from using a hash data structure to associate each user with data like first name, last name, email, etc. This way specific users can be found very quickly without having to iterate over the entire collection.

Another example are binary search trees, which allow finding elements very quickly in ordered sets of millions of elements. This is achieved by discarding halves of the tree as the desired element is searched.


Types of data structures

There are many types of data structures. Below are some useful categories to classify them.

According to relationship between elements

  • Linear: Elements are organized sequentially one after the other. For example, arrays, lists, stacks, queues.

  • Non-linear: Elements are organized in a hierarchy or graph. This is the case with trees and graphs.

According to element type

  • Homogeneous: Store a single data type. For example, arrays in a language like Java.

  • Heterogeneous: Allow different data types. Objects, records are examples of this classification.

According to access mode

  • Sequential access: Elements can only be accessed in sequential order. For example, linked lists.

  • Direct access: Any element can be accessed directly by its position. Arrays fall into this group.

  • Associative access: Elements are accessed by an associated key. Dictionaries, hashes fall into this category.

According to functionality

  • Arrays: Fast access to elements by index but difficult to insert/delete.

  • Lists: Easy to insert/delete anywhere but sequential access is slow.

  • Stacks: LIFO (Last In First Out) access. Useful for undo/redo.

  • Queues: FIFO (First In First Out) access. Useful for event processing.

  • Trees: Allow modeling hierarchical relationships like file directories or task dependencies.

  • Graphs: Allow modeling interconnectivity networks like maps, social relationships, etc.

  • Hashes / Dictionaries: Associate elements with unique keys for ultra fast access.

This classification is not exhaustive but gives an idea of the diversity of data structures and their different properties that allow us to efficiently model complex problems.


Example

Let’s look at a Python example to see how a data structure is created and used. Suppose we want to represent a print queue where prints are processed in order of arrival (FIFO).

First, we define a PrintQueue class to represent our queue:

class PrintQueue:
    def __init__(self):
        self.items = []

    def enqueue(self, item):
        self.items.append(item)

    def dequeue(self):
        return self.items.pop(0)

    def is_empty(self):
        return len(self.items) == 0

Then we use it to add prints and process them in order:

print_queue = PrintQueue()

print_queue.enqueue("document1.doc")
print_queue.enqueue("image3.jpg")
print_queue.enqueue("presentation.ppt")

while not print_queue.is_empty():
    next_item = print_queue.dequeue()
    print(f"Printing {next_item}")

This will print:

Printing document1.doc
Printing image3.jpg
Printing presentation.ppt

With a data structure like the queue we implement FIFO logic in a simple, reusable way. And this is just one example, the possibilities are endless!


Conclusion

Data structures are fundamental programming tools that allow us to optimally organize information to solve complex problems. Knowing the different types of structures available, like arrays, lists, stacks, queues, hashes, graphs, and trees, allows us to build more efficient programs. I hope this introduction has given you some knowledge and tools to start mastering this exciting topic!


1 - Arrays

Arrays are fundamental data structures in programming that allow storing and organizing collections of data of the same type. Mastering the use of arrays is essential for any programmer.

An array is a data structure that represents a set of elements, which are accessed through contiguous numeric indices ranging from 0 to the size of the array minus 1. Arrays provide fast, direct access to elements based on their position.

In languages like Python and Ruby, arrays are known as ’lists’. In Javascript they are known as ‘arrays’.

Arrays are typically homogeneous, storing elements of the same type like integers, strings, etc. Some languages allow heterogeneous arrays with values of different types.

Diagram of an array

Creating arrays

The way to create arrays varies according to the programming language:

MY_ARRAY = ["A", "B", "C"] # array literal
my_array = list(range(5)) # array from range

When creating a literal array its elements are initialized directly. When constructing an empty array its size is specified but its elements are initialized with a default value (0 for numbers, null for objects, etc).

Accessing and modifying elements

Individual elements are quickly accessed by their index using brackets []:

my_array = ['a', 'b', 'c']
print(my_array[0]) # 'a'
print(my_array[2]) # 'c'
my_array[2] = 'z'
print(my_array[2]) # 'z'

Indices start at 0, so in an array of size N, valid indices are between 0 and N-1.

Accessing an invalid index causes an error, for example, accessing index 3 in an array of size 3. This is known as “index out of bounds”.

Traversing an array

We can traverse all elements using a for loop:

letters = ['a', 'b', 'c']

for i in range(len(letters)):
  print(letters[i])

This prints each element in order. len() returns the total length of the array.

Another way is by directly iterating over the elements:

letters = ['a', 'b', 'c']

for letter in letters:
  print(letter)

Searching in an array

We can search for an element in an array using a loop and comparing element by element:

letters = ['a', 'b', 'c']

def search_in_array(array, element):
  for i in range(len(array)):
    if array[i] == element:
      return i
  return False

print(search_in_array(letters, 'b')) # 1
print(search_in_array(letters, 'z')) # False

It returns the index if found or False if not found.

Multidimensional array

Arrays can have more than one dimension, for example, 2D matrices, 3D cubes, etc.

A 2D array can be seen as a table with rows and columns. To access an element two indices are specified, one for the row and one for the column:

matrix = [
  [1, 2, 3],
  [4, 5, 6]
]

print(matrix[0][2]) # 3
print(matrix[1][0]) # 4

They can have more dimensions, for example a 3D array to represent pixels in an image.


Conclusion

Arrays are fundamental data structures in programming that provide efficient access to elements in memory through numeric indices. Having a good command of arrays, matrices, and their uses is essential for any programmer.


2 - Maps (Dictionaries)

Maps (also called hashes or dictionaries) are data structures that associate keys with values. They allow ultra fast access to elements through unique keys. In Python they are known as dictionaries.

A dictionary, or map, consists of a collection of key-value pairs. The key is used to access the associated value. Keys must be unique within a dictionary. Values can be repeated.

Diagram of a map


Main operations

  • Add/update: Inserts a key-value pair. If the key existed, its value is replaced.
    dictionary['key'] = 'value'
    
  • Get value: Accesses the value given a key.
    value = dictionary['key']
    
  • Delete: Removes a key-value pair from the dictionary.
    del dictionary['key']
    
  • Traverse: Iterate over the keys, values or pairs of the dictionary.
    for key in dictionary:
      print(key, dictionary[key]) # key, value
    

Creating a dictionary or map

The syntax for creating maps or dictionaries in Python is:

empty_dictionary = {}

person = {
  'name': 'John',
  'age': 25
}

Usage examples

Dictionaries are useful in many cases. Below are some examples.

Objects and mappings

We can model objects and entities with key-value attributes:

product = {
  'name': 'Smartphone',
  'price': 500,
  'brand': 'XYZ'
}

Counts and frequencies

Counting occurrences of elements in sequences:

text = "Hello world world"

frequencies = {}

for word in text.split():
    if word in frequencies:
        frequencies[word] += 1
    else:
        frequencies[word] = 1

print(frequencies)
# {'Hello': 1, 'world': 2}

Storing and accessing data

As a high performance alternative to lists and arrays.


Conclusion

Dictionaries are versatile data structures thanks to their fast access based on unique keys. They have uses in almost all programs, so mastering dictionaries is essential in any language.


3 - Linked Lists

Linked lists are a dynamic and flexible data structure that allows efficiently storing collections of elements. Unlike arrays, linked lists do not require reserving a fixed size in advance.

A linked list is composed of nodes where each node has two parts:

  • Data or information
  • Reference to the next node

Nodes are organized sequentially, each pointing to the next. The last node points to null to indicate the end.

This dynamic structure allows efficient insertion and deletion of nodes.


Types of linked lists

There are several types:

  • Singly linked: Each node points to the next one. Useful for queues and stacks.

Diagram of a singly linked list

  • Doubly linked: Each node has reference to the next and previous ones. Allows traversing in both directions.

Diagram of a doubly linked list

  • Circular: The last node points to the first forming a cycle. Useful for circular buffers.

Diagram of a circular linked list


Common operations

  • Insert: Add nodes to the beginning, end or middle of the list.

  • Delete: Remove nodes by value or position.

  • Search: Find nodes by value by traversing the list.

  • Traverse: Iterate through nodes by following the references.


Implementation

Linked lists can be implemented as follows:

Use the ListNode class to represent nodes:

class ListNode:
  def __init__(self, value):
    self.value = value
    self.next = None

Then to create and use a list define a LinkedList class with methods for the operations:

class LinkedList:
  def __init__(self):
    self.head = None

  def add_to_start(self, new_value):
    new_node = ListNode(new_value)
    new_node.next = self.head
    self.head = new_node

  def print(self):
    current = self.head
    while current != None:
      print(current.value)
      current = current.next

  def search(self, searched_value):
    current = self.head
    while current != None:
      if current.value == searched_value:
        return True
      current = current.next
    return False

  #...other methods

With this LinkedList class we can create a list, add nodes, print it, search elements, etc.

We could add other methods like insert at end, delete by value, get by index, etc. But this gives an idea of how to encapsulate the linked list functionality in a class. As practice, feel free to try adding these methods on your own. Don’t hesitate to leave your comments and contact me if you need help!


Advantages and disadvantages

Advantages:

  • Inserting and deleting nodes is efficient.
  • Doesn’t require defining a fixed size like arrays.
  • Dynamic, flexible structure.

Disadvantages:

  • More memory usage from having to store references.
  • Accessing elements by index is slower since it is sequential.

Usage examples

  • Implementing structures like stacks and queues.
  • In doubly linked lists, traverse the list in both directions.
  • Blockchains like Bitcoin’s.
  • Playing elements in order like music playlists.

Conclusion

Linked lists are a versatile data structure for storing dynamic sequences of elements. Having a good handle on these lists, their operations, and uses is essential for any programmer.


4 - Stacks

Stacks are an abstract data structure that operates under the LIFO (last in, first out) principle, where the last element to enter is the first to leave.

The LIFO nature of stacks is due to the fact that only the top element can be accessed and manipulated. The operation of placing an element on the stack is known as “push”, while removing an element from the stack is a “pop”. The LIFO operation causes the last element placed in a stack to be the first to be removed.

Diagram of a stack


Main operations

The primary operations supported by a stack structure are:

  • Push: adds an element to the top of the stack.
  • Pop: removes the element at the top of the stack.
  • Peek: allows accessing the top element without removing it from the stack.
  • isEmpty: checks if the stack is empty.

Most languages like Python and Java provide stack implementations in their standard libraries.


Implementation

A stack can be implemented using a linked list so that each node points to the previous node.

class Node:
  def __init__(self, value):
    self.value = value
    self.previous = None


class Stack:
  def __init__(self):
    self.top = None
    self.size = 0

  def push(self, value):
    new_node = Node(value)
    if self.top is None:
      self.top = new_node
    else:
      new_node.previous = self.top
      self.top = new_node
    self.size += 1

  def pop(self):
    if self.top is None:
      return None
    top_node = self.top
    self.top = self.top.previous
    self.size -= 1
    return top_node.value

  def peek(self):
    if self.top is None:
      return None
    return self.top.value

  def is_empty(self):
    return self.top is None  # Returns true if top is None

  def __len__(self):
    return self.size

  def __str__(self):
    values = []
    current = self.top
    while current:
      values.append(str(current.value))
      current = current.previous
    return "\n".join(values)

Usage examples

Stacks have many uses in programming:

  • Execution stack (call stack): records pending function calls to resolve. Implements expected LIFO behaviour.

  • Browser stack: allows going back (undo) in the browser history similarly to a LIFO stack.

  • Math expression execution: stacks can verify parentheses, brackets, braces, etc.

  • Algorithms and data structures: like in the quicksort algorithm and in data path implementations.


Conclusion

Stacks are versatile data structures thanks to their LIFO operation principle. Having a good command of stacks, their uses and applications is essential in computer science.


5 - Queues

Queues are an abstract data structure that operates under the FIFO (first in, first out) principle, where the first element to enter is also the first to leave. Queues are used to order elements so that the first to arrive is processed first. Understanding their operation is essential for any programmer.

The FIFO (first in, first out) nature of queues is because only the initial element can be accessed and manipulated. When an element is added to the queue it is known as “enqueue”, while removing an element is called “dequeue”.

This causes the first element to be added to the queue to also be the first to be removed, hence its FIFO behaviour.

Diagram of a queue


Main operations

The basic queue operations are:

  • Enqueue: Adds an element to the end of the queue.
  • Dequeue: Removes the element from the front of the queue.
  • Peek: Gets the front element without removing it.
  • isEmpty: Checks if the queue is empty.

Implementation

Like stacks, queues can be implemented using linked lists. Elements are added at the end and removed from the front by keeping references to both ends.

class Node:
  def __init__(self, value):
    self.value = value
    self.next = None


class Queue:
  def __init__(self):
    self.front = None
    self.end = None
    self.size = 0

  def enqueue(self, value):
    new_node = Node(value)
    if self.end is None:
      self.end = new_node
      self.front = new_node
      return
    self.end.next = new_node
    self.end = new_node
    self.size += 1

  def dequeue(self):
    if self.is_empty():
      return None
    value = self.front.value
    self.front = self.front.next
    if self.front is None:
      self.end = None
    self.size -= 1
    return value

  def peek(self):
    if self.is_empty():
      return None
    return self.front.value

  def is_empty(self):
    return self.front is None  # Returns true if front is None

  def __len__(self):
    return self.size

  def __str__(self):
    values = []
    current = self.front
    while current:
      values.append(str(current.value))
      current = current.next
    return "\n".join(values)

Usage examples

Some common uses of queues:

  • Print queues where first in, first printed.
  • Task queues in operating systems for execution order.
  • Simulations where arrival order must be respected like in banks.
  • Message queues like RabbitMQ or Kafka.
  • Circular buffers in audio for streaming.

Conclusion

Queues are versatile structures thanks to their FIFO principle. Having a good handle on queues, implementation, and applications will reinforce your skills as a programmer.