6. Iterable, Iterator, Generator
In Python, Iterables and Iterators might seem similar, but they serve distinct purposes and have unique characteristics.
An Iterable is an object that can be looped over. This means you can go through its elements one by one, typically using a for loop. Lists, tuples, strings, and dictionaries are all examples of Iterables. They need to implement an __iter__()
method that returns an Iterator or a __getitem__()
method for indexed access.
An Iterator, on the other hand, is an object that keeps track of its current state during iteration. It must have a __next__()
method, which returns the next item in the sequence and moves forward, and an __iter__()
method that returns self. One key aspect of Iterators is that they can only move forward; you can't get an item that has already been iterated over unless you create a new Iterator.
So, the primary difference is that while both can be iterated over, an Iterator also keeps track of its current position in the Iterable. While all Iterators are Iterables (because they implement an __iter__()
method), not all Iterables are Iterators (because they do not provide a __next__()
method). The power of Iterators comes from their ability to provide items one at a time rather than storing all items in memory at once, which is especially useful when working with large data sets.
A generator is a specific type of iterator, which allows us to implement an iterator in a clear and concise way. It's a special kind of function that returns an iterator which we can iterate over to yield sequence of values.
The main difference between a function and a generator in Python is the presence of the yield keyword. While a return statement completely finishes a function execution, yield produces a value and suspends the function’s execution. The function can then be resumed from where it left off, allowing the function to produce a series of values over time, rather than computing them all at once and sending them back.
6.1. Iterable
Data types are Iterables in Python:
- Lists
- Tuples
- Strings
- Dictionaries
- Sets
- Generators
In Python, an Iterable is an object that can be iterated (looped) over. Essentially, if an object has an __iter__()
method, it is an Iterable.
You can use dir()
to check if an object is iterable
Here's an example:
In this case, pets
is a list, and lists in Python are Iterables. We can loop over the list using a for
loop, and it will print each pet's name one at a time.
6.1.1. Checking if an Object is Iterable
You can check whether an object is iterable by using Python's built-in dir()
function. If __iter__
appears in the output, the object is iterable.
6.2. Iterator
An Iterator, on the other hand, is an object with a state that remembers where it is during iteration. While all Iterator objects are Iterable, not all Iterables are Iterators.
An Iterator object is initialized using the iter()
method, and the next()
method is used for iteration. Important to note, an Iterator can only move forward; it cannot go backward.
Let's turn our list of pets into an Iterator:
Here we get an error becausepets
is a list, and lists are not Iterators. We can, however, turn it into an Iterator using the iter()
method:
Or a better way to do it is:
Now,iterator_obj
is an Iterator. You can't use next()
directly on the list pets (you would get a TypeError
), but you can on iterator_obj
. Let's print the names one by one:
print(next(iterator_obj))
print(next(iterator_obj))
print(next(iterator_obj))
print(next(iterator_obj))
print(next(iterator_obj)) # This will raise StopIteration error
next()
again would result in a StopIteration error, as there are no more items to iterate over. This is how Python signals the end of an Iterator.
6.2.1. Handling StopIteration
We can handle the StopIteration
error elegantly using a try/except block. Here's how to loop over all the elements of an Iterator, stopping cleanly when there are no more items:
pets = ['Tub', 'Furrytail', 'Cat', 'Barkalot']
iterator_obj = iter(pets)
while True:
try:
next_obj = next(iterator_obj)
print(next_obj)
except StopIteration:
break
This will print out each pet's name, just like the for
loop did earlier, but this time we're using the Iterator's next()
method.
When we reach the end of the Iterator and a StopIteration
exception is thrown, our except
clause catches it and we break
out of the loop, preventing any errors.
NOTE: Here is one example of how to create a Iterator class, we will cover more details in Chapter 7
Creating a Custom Iterator
Let's look at an example: the NumIter
class, which is a simple iterator that counts numbers within a given range.
class NumIter:
def __init__(self, start, end):
self.start = start
self.end = end
self.current = start
def __iter__(self):
return self
def __next__(self):
if self.current >= self.end:
raise StopIteration
else:
return_val = self.current
self.current += 1
return return_val
-
The
__init__
method initializes the iterator. It takes astart
and anend
as arguments, which determine the range of numbers that the iterator will cover. self.current is used to keep track of the current number in the sequence. -
The
__iter__
method is what makes this class an Iterable. It returnsself
, indicating that an instance of the class is its own iterator. -
The
__next__
method is the heart of an Iterator. It returns the next value in the sequence each time it's called, and incrementsself.current
to prepare for the next call. When there are no more numbers left in the sequence, it raises theStopIteration
exception, signalling that all values have been returned.
Now, let's see how we can use this NumIter class:
In this code, we create an instance of1
that starts at 1
and ends at 5
. We then use a 1
loop to iterate over 1
. This will output the numbers 1
to 4
, one per line.
Note that after you've iterated over an Iterator, it's "exhausted
" and you can't iterate over it again. So if you try to use the 1
loop with 1 again, it won't print anything.
However, if we create a new instance, we can manually iterate over it using the next()
function:
6.3. Generator
A generator is a special type of iterator in Python. Generators don't store all their values in memory, but generate them on the fly. This makes them more memory efficient, especially when dealing with large data sets. It doesn't need iter() and next() methods
6.3.1. Creating a Generator
A generator function looks very much like a regular function, but instead of returning a value, it yields it. Here's an example:
In this example,iter_nums
is a generator function. It takes a start
and an end
argument and yields numbers from start
up to, but not including, end
.
We can iterate over this generator using the next()
function:
Or we can use a for
loop:
Generator function is more readable than iterator class we created previously.
Another example
numbers = [1, 2, 3, 4, 5]
# Now we have a list of numbers, and we want to double each number in the list
def double_nums(nums):
output = []
for num in nums:
output.append(num*2)
return output
output_nums = double_nums(numbers)
print(output_nums)
numbers = [1, 2, 3, 4, 5]
def double_nums(nums):
for num in nums:
yield num*2
output_nums = double_nums(numbers)
print(output_nums)
To get the values, you can use next()
:
Or output as a list:
Of course you can also convert it to the list comprehension as we mentioned in the Chapter 5:
numbers = [1, 2, 3, 4, 5]
output_nums_list = [num*2 for num in numbers]
print(output_nums_list)
6.3.2. Generator Expressions
If you replace the square brackets with the parentheses, you will get the generator object:
In this case,(num * 2 for num in numbers)
is a generator expression that generates doubled numbers.
6.3.3. Generator vs Iterator
In Python, both iterator and generator can be used to iterate over a sentence, word by word. They provide a convenient way to process each word individually. Let's see how we can create an iterator and a generator to do this.
Creating a Sentence Iterator
First, let's create an iterator. The iterator class, SentIter, will split the sentence into words and yield each word one by one:
class SentIter:
def __init__(self, sentence):
self.sentence = sentence
self.index = 0
self.words = sentence.split()
def __iter__(self):
return self
def __next__(self):
if self.index >= len(self.words):
raise StopIteration
else:
return_val = self.words[self.index]
self.index += 1
return return_val
Creating a Sentence Generator
Now, let's see how we can do the same thing with a generator. We'll write a function, iter_sent
, that takes a sentence, splits it into words, and yields each word:
6.3.4. Generator Performance
In Python, a generator is a simpler and more memory-efficient alternative to a list, especially when the list is large. To demonstrate the difference, let's compare the memory usage and execution time of a list and a generator.
Firstly, let's import the necessary modules. We'll use random
for generating random data, time
for measuring execution time, psutil
and os
for measuring memory usage:
Installation of psutil via conda
:
Performance Comparison: List vs Generator in Python
I adapte the code with minor updates from Corey to compare the performance of list and generator
Then, let's define a function to measure the current memory usage:def memory_usage_psutil():
process = psutil.Process(os.getpid())
mem = process.memory_info().rss / float(2 ** 20)
return mem
names = ['John', 'Corey', 'Adam', 'Steve', 'Rick', 'Thomas']
majors = ['Math', 'Engineering', 'CompSci', 'Arts', 'Business']
Before generating the data, we'll check and print the memory usage:
Now let's define a function to generate a list of people:def people_list(num_people):
result = []
for i in range(num_people):
person = {
'id': i,
'name': random.choice(names),
'major': random.choice(majors)
}
result.append(person)
return result
def people_generator(num_people):
for i in range(num_people):
person = {
'id': i,
'name': random.choice(names),
'major': random.choice(majors)
}
yield person
# Use people_list for comparison
t1 = time.process_time()
people = people_list(1000000)
t2 = time.process_time()
print('Memory (After) : {}Mb'.format(memory_usage_psutil()))
print('Took {} Seconds'.format(t2-t1))
223.44140625Mb
after generating the list of people. Now let's see how the generator performs:
# Use people_generator for comparison
t1 = time.process_time()
people = people_generator(1000000)
t2 = time.process_time()
print('Memory (After) : {}Mb'.format(memory_usage_psutil()))
print('Took {} Seconds'.format(t2-t1))
0.0078125Mb
after generating the generator object. This is because the generator doesn't store the data in memory. Instead, it yields each person one by one. This is why the memory usage of the generator is much lower than that of the list.