Don’t Write Another Line of Code Until You See These Pydantic V2 Breakthrough Features
Data Engineers Beware: Pydantic V2’s Game-Changing Upgrades Are Here!
Introduction
Pydantic is the go-to data validation python library. With about 20 million downloads per week, it is among the top 100 python libraries.
Pydantic V2 was recently released (with a rust backend) — it is 5–50X faster than V1! The release also introduces many new breakthrough features, of which I will be discussing.
💡 DISCLAIMER:
The LLM robot helped me write the
clickbaity title for this article.
Currently using Pydantic V1?
Follow the official migration guide. This video may be helpful too:
1. Validating Functions
Pydantic V2 allows you to validate any function’s arguments! (Yes, including class methods.)
An example function with strict type requirements for input is shown below:
from pydantic import validate_call
from pydantic.types import conint
@validate_call
def echo_hello(n_times: conint(gt=0, lt=11), name: str, loud: bool):
"""
Greets someone with an echo.
Args:
n_times: How many echos. Min value is 1, max is 10.
name: Name to greet
loud: Do you want the greeting to be loud?
"""
greeting = f"Hello {name}!"
if loud:
greeting = greeting.upper() + "!!"
for i in range(n_times):
print(greeting)
# Call this function
echo_hello(n_times=1, name="Yaakov", loud=True) # Valid
echo_hello(n_times=10, name="Yaakov", loud=True) # Valid
# The following will raise an error:
echo_hello(n_times=20, name="Yaakov", loud=True) # Invalid!
echo_hello(n_times=1, name=1234, loud=True) # Invalid!
💡 NOTE: This feature allows removal of argument validation from function calls — a boon for writing neater and simpler code.
2. Discriminated Unions
Discriminated unions are union operators which determine which model or value to use, based on discriminator field. This is especially useful for FastAPI developers who may need to return different models from an API endpoint, given different request parameters.
An example demonstrating how discriminated unions are performed is shown below:
from typing import Union, Literal, List
from pydantic import BaseModel, Field
class ModelA(BaseModel):
d_type: Literal["single"]
value: int = Field(default=0)
class ModelB(ModelA):
"""Inherits from ModelA, making the union challenging"""
d_type: Literal["many"]
values: List[int] = Field(default_factory=list)
class ModelC(BaseModel):
v: Union[ModelA, ModelB] = Field(discriminator="d_type")
# Populate with extra fields, see what happens
m_1 = ModelC(v={"value": 123, "values": [123], "d_type": "single"})
m_2 = ModelC(v={"value": 123, "values": [123], "d_type": "many"})
print(m_1, m_2, sep="\n")
# v=ModelA(d_type='single', value=123)
# v=ModelB(d_type='many', value=123, values=[123])
💡 NOTE: Discriminated unions greatly simplify cases of model inheritance.
🎯 GO DEEPER: Pydantic for Experts: Discriminated Unions in Pydantic V2
3. Validated Types with Annotated Validators
Validation no longer needs to be bound to a model! With Annotated Validators, you can create a validator on a field (i.e. type) directly. (Implicit decorators, be gone!)
This is helpful if your validation is specific to a field — or if you want share this validation between modules.
Here’s a simple example of a custom type, PositiveNumber
, which only accepts positive numbers as input:
from typing_extensions import Annotated
from pydantic.functional_validators import AfterValidator
def validate(v: int):
assert v > 0
PositiveNumber = Annotated[int, AfterValidator(validate)]
A larger more detailed example of a custom type Price
is show below:
from typing import Any
from typing_extensions import Annotated
from pydantic import BaseModel
from pydantic.functional_validators import AfterValidator, BeforeValidator
def remove_currency(v: Any) -> int:
"""Remove currency symbol from any input"""
if isinstance(v, str):
v = v.replace('$', '')
return v
def truncate_max_number(v: int) -> int:
"""Any number greater than 100 will be set at 100"""
return min(v, 100)
# Create a custom type (importable!)
Price = Annotated[
int,
BeforeValidator(remove_currency),
AfterValidator(truncate_max_number)
]
class Model(BaseModel):
price: Price
# Instantiate the model to demonstrate
m = Model(price="$12") # price=12
m = Model(price=12) # price=12
m = Model(price=101) # price=100
print(m)
Sharing fields between models & modules:
Annotated Validators unlocks the sharing of fields between models and modules — since custom types contain validators on the object themselves!
This declarative approach is a major enhancement and a major boon for python developers working with custom types.
4. Validation without BaseModel using TypeAdapter
Until now, Pydantic has been
BaseModel
or bust. 💢 Validation was restricted to instances ofBaseModel
.
V2 introduces TypeAdapter
, a special class which allows transformation without creating a BaseModel
. (Not possible in V1)
TypeAdapter for unittests — test fields instead of models
The TypeAdapter
is especially useful for writing unittests on specific fields. This decomposition allows for neater + smaller unittests → happier times for you!
An example using TypeAdapter
to transform/validate objects into NumberList
, a custom type, is shown below:
from typing import List, Any
from typing_extensions import Annotated
import pytest
from pydantic import TypeAdapter
from pydantic.functional_validators import BeforeValidator
def coerce_to_list(v: Any) -> List[Any]:
if isinstance(v, list):
return v
else:
return [v]
NumberList = Annotated[
List[int],
BeforeValidator(coerce_to_list)
]
@pytest.mark.parameterize(
('v', 'expected'),
[
pytest.param(1, [1], id="single to list"),
pytest.param([1, 2, 3], [1, 2, 3], id="list, no change"),
pytest.param([1, '2'], [1, 2], id="list with string nums"),
]
)
def test_number_list(v: Any, expected: List[int]):
ta = TypeAdapter(NumberList)
res = ta.validate_python(v)
assert res == expected
💡 NOTE: Whenever I’m testing behavior of a specific model field, I like to write parameterized unittests which call TypeAdapter. This way, I get a detailed explanation of which values failed.
5. Custom Serialization
V2 offers a powerful new way to transform data formats to and from your python model.
Say you have a datetime
object but want to represent it in different ways when you export your model. With a custom serializer, you can specify exactly that.
An example of a custom field_serializer
is shown below:
from datetime import datetime
from pydantic import BaseModel, field_serializer
class BroadwayTicket(BaseModel):
show_name: str
show_time: datetime
@field_serializer("show_time")
def transform_show_time(v) -> str:
"""Returns human readable show time format"""
return v.strftime("%b %d, %Y, %I:%M %p")
# Create an object
my_tickets = BroadwayTicket(
show_name="Parade",
show_time=datetime(2023, 8, 5, 19) # August 8, 7:00PM
)
print(my_tickets.model_dump())
# {'show_name': 'Parade', 'show_time': 'Aug 05, 2023, 07:00 PM'}
⚠️ NOTE: Serialization does not perform “2 way conversion” — that is, exported “serialized” format do not automatically serialize back into the data type.
There are advanced methods to completed this “2 way conversion”, but as of now, no easy path. I may write a follow up article on this.
Conclusion
Pydantic V2 is a good time. It gives you more control over your stuff. It is blazingly fast (compared to python libraries). It also offers a ton of new powerful features.
Happy coding!