Image by DALL·E 2. (Prompt: “A futuristic computer performing complex tasks, energy emanates from it”)

Don’t Write Another Line of Code Until You See These Pydantic V2 Breakthrough Features

Data Engineers Beware: Pydantic V2’s Game-Changing Upgrades Are Here!

Yaakov Bressler
Data Engineer Things
5 min readAug 1, 2023

--

Introduction

Pydantic is the go-to data validation python library. With about 20 million downloads per week, it is among the top 100 python libraries.

Pydantic V2 was recently released (with a rust backend) — it is 5–50X faster than V1! The release also introduces many new breakthrough features, of which I will be discussing.

💡 DISCLAIMER:
The LLM robot helped me write the
clickbaity title for this article.

Currently using Pydantic V1?

Follow the official migration guide. This video may be helpful too:

Video Tutorial for migrating from Pydantic V1 to V2.

1. Validating Functions

Pydantic V2 allows you to validate any function’s arguments! (Yes, including class methods.)

An example function with strict type requirements for input is shown below:

from pydantic import validate_call
from pydantic.types import conint


@validate_call
def echo_hello(n_times: conint(gt=0, lt=11), name: str, loud: bool):
"""
Greets someone with an echo.

Args:
n_times: How many echos. Min value is 1, max is 10.
name: Name to greet
loud: Do you want the greeting to be loud?
"""
greeting = f"Hello {name}!"

if loud:
greeting = greeting.upper() + "!!"

for i in range(n_times):
print(greeting)


# Call this function
echo_hello(n_times=1, name="Yaakov", loud=True) # Valid
echo_hello(n_times=10, name="Yaakov", loud=True) # Valid

# The following will raise an error:
echo_hello(n_times=20, name="Yaakov", loud=True) # Invalid!
echo_hello(n_times=1, name=1234, loud=True) # Invalid!

💡 NOTE: This feature allows removal of argument validation from function calls — a boon for writing neater and simpler code.

2. Discriminated Unions

Discriminated unions are union operators which determine which model or value to use, based on discriminator field. This is especially useful for FastAPI developers who may need to return different models from an API endpoint, given different request parameters.

An example demonstrating how discriminated unions are performed is shown below:

from typing import Union, Literal, List

from pydantic import BaseModel, Field


class ModelA(BaseModel):
d_type: Literal["single"]
value: int = Field(default=0)


class ModelB(ModelA):
"""Inherits from ModelA, making the union challenging"""
d_type: Literal["many"]
values: List[int] = Field(default_factory=list)


class ModelC(BaseModel):
v: Union[ModelA, ModelB] = Field(discriminator="d_type")


# Populate with extra fields, see what happens
m_1 = ModelC(v={"value": 123, "values": [123], "d_type": "single"})
m_2 = ModelC(v={"value": 123, "values": [123], "d_type": "many"})

print(m_1, m_2, sep="\n")
# v=ModelA(d_type='single', value=123)
# v=ModelB(d_type='many', value=123, values=[123])

💡 NOTE: Discriminated unions greatly simplify cases of model inheritance.

🎯 GO DEEPER: Pydantic for Experts: Discriminated Unions in Pydantic V2

3. Validated Types with Annotated Validators

Validation no longer needs to be bound to a model! With Annotated Validators, you can create a validator on a field (i.e. type) directly. (Implicit decorators, be gone!)

This is helpful if your validation is specific to a field — or if you want share this validation between modules.

Here’s a simple example of a custom type, PositiveNumber, which only accepts positive numbers as input:

from typing_extensions import Annotated
from pydantic.functional_validators import AfterValidator


def validate(v: int):
assert v > 0

PositiveNumber = Annotated[int, AfterValidator(validate)]

A larger more detailed example of a custom type Price is show below:

from typing import Any
from typing_extensions import Annotated

from pydantic import BaseModel
from pydantic.functional_validators import AfterValidator, BeforeValidator


def remove_currency(v: Any) -> int:
"""Remove currency symbol from any input"""
if isinstance(v, str):
v = v.replace('$', '')
return v

def truncate_max_number(v: int) -> int:
"""Any number greater than 100 will be set at 100"""
return min(v, 100)


# Create a custom type (importable!)
Price = Annotated[
int,
BeforeValidator(remove_currency),
AfterValidator(truncate_max_number)
]


class Model(BaseModel):
price: Price


# Instantiate the model to demonstrate
m = Model(price="$12") # price=12
m = Model(price=12) # price=12
m = Model(price=101) # price=100
print(m)

Sharing fields between models & modules:

Annotated Validators unlocks the sharing of fields between models and modules — since custom types contain validators on the object themselves!

This declarative approach is a major enhancement and a major boon for python developers working with custom types.

4. Validation without BaseModel using TypeAdapter

Until now, Pydantic has been BaseModel or bust. 💢 Validation was restricted to instances of BaseModel.

V2 introduces TypeAdapter, a special class which allows transformation without creating a BaseModel. (Not possible in V1)

TypeAdapter for unittests — test fields instead of models

The TypeAdapter is especially useful for writing unittests on specific fields. This decomposition allows for neater + smaller unittests → happier times for you!

An example using TypeAdapter to transform/validate objects into NumberList, a custom type, is shown below:

from typing import List, Any
from typing_extensions import Annotated

import pytest

from pydantic import TypeAdapter
from pydantic.functional_validators import BeforeValidator


def coerce_to_list(v: Any) -> List[Any]:
if isinstance(v, list):
return v
else:
return [v]


NumberList = Annotated[
List[int],
BeforeValidator(coerce_to_list)
]


@pytest.mark.parameterize(
('v', 'expected'),
[
pytest.param(1, [1], id="single to list"),
pytest.param([1, 2, 3], [1, 2, 3], id="list, no change"),
pytest.param([1, '2'], [1, 2], id="list with string nums"),
]
)
def test_number_list(v: Any, expected: List[int]):
ta = TypeAdapter(NumberList)
res = ta.validate_python(v)
assert res == expected

💡 NOTE: Whenever I’m testing behavior of a specific model field, I like to write parameterized unittests which call TypeAdapter. This way, I get a detailed explanation of which values failed.

5. Custom Serialization

V2 offers a powerful new way to transform data formats to and from your python model.

Say you have a datetime object but want to represent it in different ways when you export your model. With a custom serializer, you can specify exactly that.

An example of a custom field_serializer is shown below:

from datetime import datetime
from pydantic import BaseModel, field_serializer

class BroadwayTicket(BaseModel):
show_name: str
show_time: datetime

@field_serializer("show_time")
def transform_show_time(v) -> str:
"""Returns human readable show time format"""
return v.strftime("%b %d, %Y, %I:%M %p")


# Create an object
my_tickets = BroadwayTicket(
show_name="Parade",
show_time=datetime(2023, 8, 5, 19) # August 8, 7:00PM
)

print(my_tickets.model_dump())
# {'show_name': 'Parade', 'show_time': 'Aug 05, 2023, 07:00 PM'}

⚠️ NOTE: Serialization does not perform “2 way conversion” — that is, exported “serialized” format do not automatically serialize back into the data type.

There are advanced methods to completed this “2 way conversion”, but as of now, no easy path. I may write a follow up article on this.

Conclusion

Pydantic V2 is a good time. It gives you more control over your stuff. It is blazingly fast (compared to python libraries). It also offers a ton of new powerful features.

Happy coding!

--

--