Creating a Transformer¶

As previously said, creating a transformer is easy:

from gloe import transformer

@transformer
def filter_even(numbers: list[int]) -> list[int]:
    """Filters out the even numbers from the input list."""
    return [num for num in numbers if num % 2 == 0]

Transformers work like functions, so you can create a function and then apply the @transformer decorator to it. That’s it, transformer created!

Some important things to notice:

We strongly recommend you to type the transformers. Because of Python, it is not mandatory, but Gloe was designed to be used with typed code. Take a look at the Python typing library to learn more about the Python type notation.
Transformers can have multiple parameters, but, under the hoods, they will be treated as a tuple.
Transformers can have no parameters.
Documentations with pydoc will be preserved in transformers.
After applying the @transformer decorator to a function, it becomes an instance of the Transformer class.

Every transformer (instance of the Transformer class) can be called just like a normal function:

filter_even([1, 2, 3, 4, 5, 6]) # returns [2, 4, 6]

Another way to create a transformer is extending from the Transformer class. This is how to implement the above example using a class instead of a function:

from gloe import Transformer

class FilterEven(Transformer[list[int], list[int]]):
    """Filters out the even numbers from the input list."""
    
    def transform(self, numbers: list[int]) -> list[int]:
        return [num for num in numbers if num % 2 == 0]

However, in this case, we first need to instantiate the FilterEven class and then use the instance as a transformer:

filter_even = FilterEven()

filter_even([1, 2, 3, 4, 5, 6]) # returns [2, 4, 6]

This doesn’t seem useful when comparing to the first example, but maybe you would like to split your code into methods of a class in some cases. In fact, we try to avoid using classes to implement transformers to keep its code and responsibility clean and short.

Building a Pipeline¶

You can create a pipeline by composing many transformers into a flow. You can do that by using the right shift operator (>>). For example, consider the filter_even created above, we can create another transformer called square:

@transformer
def square(numbers: list[int]) -> list[int]:
    return [num * num for num in numbers]

It is simple to compose these two transformers sequentially, it means, filter the even numbers and then square each of them:

pipeline = filter_even >> square

Naming things

We call this serial connection

By doing this, the pipeline variable becomes a transformer that executes the processing of the composed transformers sequentially. We can also call it as well:

pipeline([1, 2, 3, 4, 5, 6]) # returns [4, 16, 36]

And you can continue appending transformers to the pipeline, even the ones already present there:

pipeline = filter_even >> square >> square

You are also able to use this entire pipeline as a step of another flow:

pipeline2 = pipeline >> my_transformer

Creating branches¶

If you need to pass the data through two paths without a dependency between them, you can create branches.

Let’s consider the example of a mailing system. We want to send a promotion email to users, but we have two types of users: the subscribed and the unsubscribed ones. We must retrieve the list of users from the database, split the groups and then send the appropriate email to each group:

send_promotion = get_users >> (
    filter_subscribed >> send_subscribed_promotion_email,
    filter_unsubscribed >> send_unsubscribed_promotion_email
)

Naming things

We call this divergent connection.

If it becomes necessary to treat each type of subscription, we can change the graph easily:

send_promotion = get_users >> (
    filter_basic_subscription >> send_basic_subscription_promotion_email,
    filter_premium_subscription >> send_premium_subscription_promotion_email,
    filter_unsubscribed >> send_unsubscribed_promotion_email
)

This example makes it clear how easy it is to understand and refactor the code when using Gloe to express the process as a graph, with each node (transformer) having an atomic and well-defined responsibility.

Important

You should not assume any order of execution between parallel branches.

The right shift operator can receive a transformer or a tuple of transformers as an argument. In the second case, the transformer returned will be as described bellow (pay attention to the types).

Consider the following transformers:

begin: Transformer[In, Mid]
branch1: Transformer[Mid, Out1]
branch2: Transformer[Mid, Out2]
...
branchN: Transformer[Mid, OutN]

Let’s take a look at the type of the transformer returned by a divergent connection:

graph: Transformer[In, tuple[Out1, Out2, ..., OutN]] = begin >> (
    branch1,
    branch2,
    ...,
    branchN
)

The return of each branch will compose a tuple following the respective order of branches.

Of course, we can append a new transformer to the above graph, the only requirement is for the incoming type of this new transformer to be tuple[Out1, Out2, ..., OutN].

end: Transformer[tuple[Out1, Out2, ..., OutN], FinalOut]

graph: Transformer[In, FinalOut] = begin >> (
    branch1,
    branch2,
    ...,
    branchN
) >> end

Naming things

We call this last connection convergent.

The end transformer can be implemented in two ways. Or using a single argument being a tuple:

@transformer
def end(data: tuple[Out1, Out2, ..., OutN]) -> FinalOut:
    ...

Either using multiple arguments, one for each branch:

@transformer
def end(arg1: Out1, arg2: Out2, ..., argN: OutN) -> FinalOut:
    ...

Attention

Python doesn’t provide a generic way to map the outcome type of an arbitrary number of branches on a tuple of arbitrary size. Due to this, the overload of possible sizes was treated one by one until the size 7, it means, considering the typing notation, it is possible to have at most 7 branches currently. That’s is not supposed to be a problem in practice.