Data Structures
Data structures are ways of organizing and storing information in a computer program so that it can be accessed and modified efficiently. As programmers, it is essential to understand the different data structures available and know when to apply each one to optimize the performance and utility of our programs.
A data structure is a particular way of organizing data in the computer’s memory so that it can be used efficiently. Data structures come in many forms, such as arrays, lists, stacks, queues, graphs, trees, hashes, etc.
Each structure organizes the data according to a specific logical model and supports efficient operations to access, modify, add, and delete elements according to that model. For example, an array organizes elements sequentially in memory to facilitate random access by indices. A linked list connects elements in memory using “nodes” with references to the next node to facilitate insertion and deletion.
By choosing the appropriate data structure for the task to be solved, we can write more efficient programs and reduce computational complexity, using fewer resources like memory and processing.
Data structures help us:
Organize large amounts of data to make them easier to access and modify.
Model complex relationships between data, such as with graphs and trees.
Access and modify data efficiently, optimizing performance.
Reuse existing code and data structures instead of having to rewrite solutions from scratch.
For example, a program that needs to store thousands of user records benefits from using a hash data structure to associate each user with data like first name, last name, email, etc. This way specific users can be found very quickly without having to iterate over the entire collection.
Another example are binary search trees, which allow finding elements very quickly in ordered sets of millions of elements. This is achieved by discarding halves of the tree as the desired element is searched.
Types of data structures
There are many types of data structures. Below are some useful categories to classify them.
According to relationship between elements
Linear: Elements are organized sequentially one after the other. For example, arrays, lists, stacks, queues.
Non-linear: Elements are organized in a hierarchy or graph. This is the case with trees and graphs.
According to element type
Homogeneous: Store a single data type. For example, arrays in a language like Java.
Heterogeneous: Allow different data types. Objects, records are examples of this classification.
According to access mode
Sequential access: Elements can only be accessed in sequential order. For example, linked lists.
Direct access: Any element can be accessed directly by its position. Arrays fall into this group.
Associative access: Elements are accessed by an associated key. Dictionaries, hashes fall into this category.
According to functionality
Arrays: Fast access to elements by index but difficult to insert/delete.
Lists: Easy to insert/delete anywhere but sequential access is slow.
Stacks: LIFO (Last In First Out) access. Useful for undo/redo.
Queues: FIFO (First In First Out) access. Useful for event processing.
Trees: Allow modeling hierarchical relationships like file directories or task dependencies.
Graphs: Allow modeling interconnectivity networks like maps, social relationships, etc.
Hashes / Dictionaries: Associate elements with unique keys for ultra fast access.
This classification is not exhaustive but gives an idea of the diversity of data structures and their different properties that allow us to efficiently model complex problems.
Example
Let’s look at a Python example to see how a data structure is created and used. Suppose we want to represent a print queue where prints are processed in order of arrival (FIFO).
First, we define a PrintQueue
class to represent our queue:
class PrintQueue:
def __init__(self):
self.items = []
def enqueue(self, item):
self.items.append(item)
def dequeue(self):
return self.items.pop(0)
def is_empty(self):
return len(self.items) == 0
Then we use it to add prints and process them in order:
print_queue = PrintQueue()
print_queue.enqueue("document1.doc")
print_queue.enqueue("image3.jpg")
print_queue.enqueue("presentation.ppt")
while not print_queue.is_empty():
next_item = print_queue.dequeue()
print(f"Printing {next_item}")
This will print:
Printing document1.doc
Printing image3.jpg
Printing presentation.ppt
With a data structure like the queue we implement FIFO logic in a simple, reusable way. And this is just one example, the possibilities are endless!
Conclusion
Data structures are fundamental programming tools that allow us to optimally organize information to solve complex problems. Knowing the different types of structures available, like arrays, lists, stacks, queues, hashes, graphs, and trees, allows us to build more efficient programs. I hope this introduction has given you some knowledge and tools to start mastering this exciting topic!
More articles
1 - Arrays
Arrays are fundamental data structures in programming that allow storing and organizing collections of data of the same type. Mastering the use of arrays is essential for any programmer.
An array is a data structure that represents a set of elements, which are accessed through contiguous numeric indices ranging from 0 to the size of the array minus 1. Arrays provide fast, direct access to elements based on their position.
In languages like Python and Ruby, arrays are known as ’lists’. In Javascript they are known as ‘arrays’.
Arrays are typically homogeneous, storing elements of the same type like integers, strings, etc. Some languages allow heterogeneous arrays with values of different types.
Creating arrays
The way to create arrays varies according to the programming language:
MY_ARRAY = ["A", "B", "C"] # array literal
my_array = list(range(5)) # array from range
When creating a literal array its elements are initialized directly. When constructing an empty array its size is specified but its elements are initialized with a default value (0 for numbers, null for objects, etc).
Accessing and modifying elements
Individual elements are quickly accessed by their index using brackets []
:
my_array = ['a', 'b', 'c']
print(my_array[0]) # 'a'
print(my_array[2]) # 'c'
my_array[2] = 'z'
print(my_array[2]) # 'z'
Indices start at 0, so in an array of size N, valid indices are between 0 and N-1.
Accessing an invalid index causes an error, for example, accessing index 3 in an array of size 3. This is known as “index out of bounds”.
Traversing an array
We can traverse all elements using a for
loop:
letters = ['a', 'b', 'c']
for i in range(len(letters)):
print(letters[i])
This prints each element in order. len()
returns the total length of the array.
Another way is by directly iterating over the elements:
letters = ['a', 'b', 'c']
for letter in letters:
print(letter)
Searching in an array
We can search for an element in an array using a loop and comparing element by element:
letters = ['a', 'b', 'c']
def search_in_array(array, element):
for i in range(len(array)):
if array[i] == element:
return i
return False
print(search_in_array(letters, 'b')) # 1
print(search_in_array(letters, 'z')) # False
It returns the index if found or False
if not found.
Multidimensional array
Arrays can have more than one dimension, for example, 2D matrices, 3D cubes, etc.
A 2D array can be seen as a table with rows and columns. To access an element two indices are specified, one for the row and one for the column:
matrix = [
[1, 2, 3],
[4, 5, 6]
]
print(matrix[0][2]) # 3
print(matrix[1][0]) # 4
They can have more dimensions, for example a 3D array to represent pixels in an image.
Conclusion
Arrays are fundamental data structures in programming that provide efficient access to elements in memory through numeric indices. Having a good command of arrays, matrices, and their uses is essential for any programmer.
Cheers for making it this far! I hope this journey through the programming universe has been as fascinating for you as it was for me to write down.
We’re keen to hear your thoughts, so don’t be shy – drop your comments, suggestions, and those bright ideas you’re bound to have.
Also, to delve deeper than these lines, take a stroll through the practical examples we’ve cooked up for you. You’ll find all the code and projects in our GitHub repository learn-software-engineering/examples-programming.
Thanks for being part of this learning community. Keep coding and exploring new territories in this captivating world of software!
2 - Maps (Dictionaries)
Maps (also called hashes or dictionaries) are data structures that associate keys with values. They allow ultra fast access to elements through unique keys. In Python they are known as dictionaries.
A dictionary, or map, consists of a collection of key-value pairs. The key is used to access the associated value. Keys must be unique within a dictionary. Values can be repeated.
Main operations
- Add/update: Inserts a key-value pair. If the key existed, its value is replaced.
dictionary['key'] = 'value'
- Get value: Accesses the value given a key.
value = dictionary['key']
- Delete: Removes a key-value pair from the dictionary.
- Traverse: Iterate over the keys, values or pairs of the dictionary.
for key in dictionary:
print(key, dictionary[key]) # key, value
Creating a dictionary or map
The syntax for creating maps or dictionaries in Python is:
empty_dictionary = {}
person = {
'name': 'John',
'age': 25
}
Usage examples
Dictionaries are useful in many cases. Below are some examples.
Objects and mappings
We can model objects and entities with key-value attributes:
product = {
'name': 'Smartphone',
'price': 500,
'brand': 'XYZ'
}
Counts and frequencies
Counting occurrences of elements in sequences:
text = "Hello world world"
frequencies = {}
for word in text.split():
if word in frequencies:
frequencies[word] += 1
else:
frequencies[word] = 1
print(frequencies)
# {'Hello': 1, 'world': 2}
Storing and accessing data
As a high performance alternative to lists and arrays.
Conclusion
Dictionaries are versatile data structures thanks to their fast access based on unique keys. They have uses in almost all programs, so mastering dictionaries is essential in any language.
Cheers for making it this far! I hope this journey through the programming universe has been as fascinating for you as it was for me to write down.
We’re keen to hear your thoughts, so don’t be shy – drop your comments, suggestions, and those bright ideas you’re bound to have.
Also, to delve deeper than these lines, take a stroll through the practical examples we’ve cooked up for you. You’ll find all the code and projects in our GitHub repository learn-software-engineering/examples-programming.
Thanks for being part of this learning community. Keep coding and exploring new territories in this captivating world of software!
3 - Linked Lists
Linked lists are a dynamic and flexible data structure that allows efficiently storing collections of elements. Unlike arrays, linked lists do not require reserving a fixed size in advance.
A linked list is composed of nodes
where each node has two parts:
- Data or information
- Reference to the next node
Nodes are organized sequentially, each pointing to the next. The last node points to null to indicate the end.
This dynamic structure allows efficient insertion and deletion of nodes.
Types of linked lists
There are several types:
- Singly linked: Each node points to the next one. Useful for queues and stacks.
- Doubly linked: Each node has reference to the next and previous ones. Allows traversing in both directions.
- Circular: The last node points to the first forming a cycle. Useful for circular buffers.
Common operations
Insert: Add nodes to the beginning, end or middle of the list.
Delete: Remove nodes by value or position.
Search: Find nodes by value by traversing the list.
Traverse: Iterate through nodes by following the references.
Implementation
Linked lists can be implemented as follows:
Use the ListNode
class to represent nodes:
class ListNode:
def __init__(self, value):
self.value = value
self.next = None
Then to create and use a list define a LinkedList class with methods for the operations:
class LinkedList:
def __init__(self):
self.head = None
def add_to_start(self, new_value):
new_node = ListNode(new_value)
new_node.next = self.head
self.head = new_node
def print(self):
current = self.head
while current != None:
print(current.value)
current = current.next
def search(self, searched_value):
current = self.head
while current != None:
if current.value == searched_value:
return True
current = current.next
return False
#...other methods
With this LinkedList
class we can create a list, add nodes, print it, search elements, etc.
We could add other methods like insert at end, delete by value, get by index, etc. But this gives an idea of how to encapsulate the linked list functionality in a class. As practice, feel free to try adding these methods on your own. Don’t hesitate to leave your comments and contact me if you need help!
Advantages and disadvantages
Advantages:
- Inserting and deleting nodes is efficient.
- Doesn’t require defining a fixed size like arrays.
- Dynamic, flexible structure.
Disadvantages:
- More memory usage from having to store references.
- Accessing elements by index is slower since it is sequential.
Usage examples
- Implementing structures like stacks and queues.
- In doubly linked lists, traverse the list in both directions.
- Blockchains like Bitcoin’s.
- Playing elements in order like music playlists.
Conclusion
Linked lists are a versatile data structure for storing dynamic sequences of elements. Having a good handle on these lists, their operations, and uses is essential for any programmer.
Cheers for making it this far! I hope this journey through the programming universe has been as fascinating for you as it was for me to write down.
We’re keen to hear your thoughts, so don’t be shy – drop your comments, suggestions, and those bright ideas you’re bound to have.
Also, to delve deeper than these lines, take a stroll through the practical examples we’ve cooked up for you. You’ll find all the code and projects in our GitHub repository learn-software-engineering/examples-programming.
Thanks for being part of this learning community. Keep coding and exploring new territories in this captivating world of software!
4 - Stacks
Stacks are an abstract data structure that operates under the LIFO (last in, first out) principle, where the last element to enter is the first to leave.
The LIFO nature of stacks is due to the fact that only the top element can be accessed and manipulated. The operation of placing an element on the stack is known as “push”, while removing an element from the stack is a “pop”. The LIFO operation causes the last element placed in a stack to be the first to be removed.
Main operations
The primary operations supported by a stack structure are:
- Push: adds an element to the top of the stack.
- Pop: removes the element at the top of the stack.
- Peek: allows accessing the top element without removing it from the stack.
- isEmpty: checks if the stack is empty.
Most languages like Python and Java provide stack implementations in their standard libraries.
Implementation
A stack can be implemented using a linked list so that each node points to the previous node.
class Node:
def __init__(self, value):
self.value = value
self.previous = None
class Stack:
def __init__(self):
self.top = None
self.size = 0
def push(self, value):
new_node = Node(value)
if self.top is None:
self.top = new_node
else:
new_node.previous = self.top
self.top = new_node
self.size += 1
def pop(self):
if self.top is None:
return None
top_node = self.top
self.top = self.top.previous
self.size -= 1
return top_node.value
def peek(self):
if self.top is None:
return None
return self.top.value
def is_empty(self):
return self.top is None # Returns true if top is None
def __len__(self):
return self.size
def __str__(self):
values = []
current = self.top
while current:
values.append(str(current.value))
current = current.previous
return "\n".join(values)
Usage examples
Stacks have many uses in programming:
Execution stack (call stack): records pending function calls to resolve. Implements expected LIFO behaviour.
Browser stack: allows going back (undo) in the browser history similarly to a LIFO stack.
Math expression execution: stacks can verify parentheses, brackets, braces, etc.
Algorithms and data structures: like in the quicksort algorithm and in data path implementations.
Conclusion
Stacks are versatile data structures thanks to their LIFO operation principle. Having a good command of stacks, their uses and applications is essential in computer science.
Cheers for making it this far! I hope this journey through the programming universe has been as fascinating for you as it was for me to write down.
We’re keen to hear your thoughts, so don’t be shy – drop your comments, suggestions, and those bright ideas you’re bound to have.
Also, to delve deeper than these lines, take a stroll through the practical examples we’ve cooked up for you. You’ll find all the code and projects in our GitHub repository learn-software-engineering/examples-programming.
Thanks for being part of this learning community. Keep coding and exploring new territories in this captivating world of software!
5 - Queues
Queues are an abstract data structure that operates under the FIFO (first in, first out) principle, where the first element to enter is also the first to leave. Queues are used to order elements so that the first to arrive is processed first. Understanding their operation is essential for any programmer.
The FIFO (first in, first out) nature of queues is because only the initial element can be accessed and manipulated. When an element is added to the queue it is known as “enqueue”, while removing an element is called “dequeue”.
This causes the first element to be added to the queue to also be the first to be removed, hence its FIFO behaviour.
Main operations
The basic queue operations are:
- Enqueue: Adds an element to the end of the queue.
- Dequeue: Removes the element from the front of the queue.
- Peek: Gets the front element without removing it.
- isEmpty: Checks if the queue is empty.
Implementation
Like stacks, queues can be implemented using linked lists.
Elements are added at the end and removed from the front by keeping references to both ends.
class Node:
def __init__(self, value):
self.value = value
self.next = None
class Queue:
def __init__(self):
self.front = None
self.end = None
self.size = 0
def enqueue(self, value):
new_node = Node(value)
if self.end is None:
self.end = new_node
self.front = new_node
return
self.end.next = new_node
self.end = new_node
self.size += 1
def dequeue(self):
if self.is_empty():
return None
value = self.front.value
self.front = self.front.next
if self.front is None:
self.end = None
self.size -= 1
return value
def peek(self):
if self.is_empty():
return None
return self.front.value
def is_empty(self):
return self.front is None # Returns true if front is None
def __len__(self):
return self.size
def __str__(self):
values = []
current = self.front
while current:
values.append(str(current.value))
current = current.next
return "\n".join(values)
Usage examples
Some common uses of queues:
- Print queues where first in, first printed.
- Task queues in operating systems for execution order.
- Simulations where arrival order must be respected like in banks.
- Message queues like RabbitMQ or Kafka.
- Circular buffers in audio for streaming.
Conclusion
Queues are versatile structures thanks to their FIFO principle. Having a good handle on queues, implementation, and applications will reinforce your skills as a programmer.
Cheers for making it this far! I hope this journey through the programming universe has been as fascinating for you as it was for me to write down.
We’re keen to hear your thoughts, so don’t be shy – drop your comments, suggestions, and those bright ideas you’re bound to have.
Also, to delve deeper than these lines, take a stroll through the practical examples we’ve cooked up for you. You’ll find all the code and projects in our GitHub repository learn-software-engineering/examples-programming.
Thanks for being part of this learning community. Keep coding and exploring new territories in this captivating world of software!