Creating a Transformer¶
As previously said, creating a transformer is easy:
from gloe import transformer
@transformer
def filter_even(numbers: list[int]) -> list[int]:
"""Filters out the even numbers from the input list."""
return [num for num in numbers if num % 2 == 0]
Transformers work like functions, so you can create a function and then apply the @transformer
decorator to it. That’s it, transformer created!
Some important things to notice:
We strongly recommend you to type the transformers. Because of Python, it is not mandatory, but Gloe was designed to be used with typed code. Take a look at the Python typing library to learn more about the Python type notation.
Transformers must have only one parameter. Any complex data you need to use in its code must be passed in a complex structure like a tuple, a dict, a TypedDict, a dataclass, a namedtuple or any other. We will see later why it is necessary.
Documentations with pydoc will be preserved in transformers.
After applying the
@transformer
decorator to a function, it becomes an instance of theTransformer
class.
Every transformer (instance of the Transformer class) can be called just like a normal function:
filter_even([1, 2, 3, 4, 5, 6]) # returns [2, 4, 6]
Another way to create a transformer is extending from the Transformer class. This is how to implement the above example using a class instead of a function:
from gloe import Transformer
class FilterEven(Transformer[list[int], list[int]]):
"""Filters out the even numbers from the input list."""
def transform(self, numbers: list[int]) -> list[int]:
return [num for num in numbers if num % 2 == 0]
However, in this case, we first need to instantiate the FilterEven
class and then use the instance as a transformer:
filter_even = FilterEven()
filter_even([1, 2, 3, 4, 5, 6]) # returns [2, 4, 6]
This doesn’t seem useful when comparing to the first example, but maybe you would like to split your code into methods of a class in some cases. In fact, we try to avoid using classes to implement transformers to keep its code and responsibility clean and short.
Building a Pipeline¶
You can create a pipeline by composing many transformers into a flow. You can do that by using the right shift operator (>>). For example, consider the filter_even
created above, we can create another transformer called square
:
@transformer
def square(numbers: list[int]) -> list[int]:
return [num * num for num in numbers]
It is simple to compose these two transformers sequentially, it means, filter the even numbers and then square each of them:
pipeline = filter_even >> square
Naming things
We call this serial connection
By doing this, the pipeline
variable becomes a transformer that executes the processing of the composed transformers sequentially. We can also call it as well:
pipeline([1, 2, 3, 4, 5, 6]) # returns [4, 16, 36]
And you can continue appending transformers to the pipeline, even the ones already present there:
pipeline = filter_even >> square >> square
You are also able to use this entire pipeline as a step of another flow:
pipeline2 = pipeline >> my_transformer
Creating branches¶
If you need to pass the data through two paths without a dependency between them, you can create branches.
Let’s consider the example of a mailing system. We want to send a promotion email to users, but we have two types of users: the subscribed and the unsubscribed ones. We must retrieve the list of users from the database, split the groups and then send the appropriate email to each group:
send_promotion = get_users >> (
filter_subscribed >> send_subscribed_promotion_email,
filter_unsubscribed >> send_unsubscribed_promotion_email
)
Naming things
We call this divergent connection.
If it becomes necessary to treat each type of subscription, we can change the graph easily:
send_promotion = get_users >> (
filter_basic_subscription >> send_basic_subscription_promotion_email,
filter_premium_subscription >> send_premium_subscription_promotion_email,
filter_unsubscribed >> send_unsubscribed_promotion_email
)
This example makes it clear how easy it is to understand and refactor the code when using Gloe to express the process as a graph, with each node (transformer) having an atomic and well-defined responsibility.
Important
You should not assume any order of execution between branches.
The right shift operator can receive a transformer or a tuple of transformers as an argument. In the second case, the transformer returned will be as described bellow (pay attention to the types).
Consider the following transformers:
begin: Transformer[In, Mid]
branch1: Transformer[Mid, Out1]
branch2: Transformer[Mid, Out2]
...
branchN: Transformer[Mid, OutN]
Let’s take a look at the type of the transformer returned by a divergent connection:
graph: Transformer[In, tuple[Out1, Out2, ..., OutN]] = begin >> (
branch1,
branch2,
...,
branchN
)
The return of each branch will compose a tuple following the respective order of branches.
Of course, we can append a new transformer to the above graph, the only requirement is for the incoming type of this new transformer to be tuple[Out1, Out2, ..., OutN]
.
end: Transformer[tuple[Out1, Out2, ..., OutN], FinalOut]
graph: Transformer[In, FinalOut] = begin >> (
branch1,
branch2,
...,
branchN
) >> end
Naming things
We call this last connection convergent.
Attention
Python doesn’t provide a generic way to map the outcome type of an arbitrary number of branches on a tuple of arbitrary size. Due to this, the overload of possible sizes was treated one by one until the size 7, it means, considering the typing notation, it is possible to have at most 7 branches currently.