Python @dataclass vs Manual __init__ Boilerplate

Every Python class that stores data needs an __init__ method to set its attributes. Add __repr__ for readable output, __eq__ for meaningful comparisons, and the boilerplate multiplies fast. The @dataclass decorator, introduced in Python 3.7 via PEP 557 (authored by Eric V. Smith), generates all three automatically from type-annotated fields using the variable annotation syntax defined in PEP 526. It also offers parameters for ordering, immutability, and memory optimization that would require dozens of additional lines to write by hand. This Python tutorial puts the two approaches side by side at every level of complexity so you can see exactly what the decorator replaces and when the manual approach is still the right choice.

How to read this tutorial

Each section builds directly on the last. Read section 1 first — it establishes exactly what @dataclass replaces, which makes every parameter in later sections land with context. Two interactive challenges are embedded mid-tutorial: a Check Your Understanding quiz after the field() section, and a Spot the Bug challenge after the validation section. Try them before reading the answer explanation — retrieval practice compounds retention significantly more than re-reading.

The @dataclass decorator is a class decorator. Unlike function decorators that wrap a function with a new function, @dataclass inspects the class definition, reads its annotated fields, and adds generated methods directly onto the class. The class itself is returned unchanged in structure; it gains new methods without being replaced by a wrapper. Understanding @dataclass as a decorator that modifies a class in place connects it to the broader decorator concepts covered throughout this series.

How to Replace Manual __init__ with @dataclass

The steps below walk through converting a hand-written class to a dataclass. Each step maps directly to a section in this tutorial.

  1. Import the decorator. Add from dataclasses import dataclass at the top of your file. The dataclasses module ships with the Python standard library and requires no installation.
  2. Replace field assignments with annotated class variables. Remove the entire __init__ method. For each self.field = field assignment, write a type-annotated class variable at the class body level: title: str. The decorator generates __init__, __repr__, and __eq__ from these annotations automatically.
  3. Apply the decorator. Place @dataclass on the line immediately above the class definition. It reads the annotated fields and generates methods at class-definition time.
  4. Handle mutable defaults with field(default_factory=...). Lists, dicts, and other mutable objects cannot be assigned as direct defaults. Replace them with field(default_factory=list) or field(default_factory=dict). Direct mutable defaults raise ValueError at class-definition time.
  5. Move validation into __post_init__. Any validation that lived inside your original __init__ moves to a __post_init__ method. This method runs after the generated __init__ finishes assigning all fields. Raise ValueError or TypeError inside it to reject invalid data.
  6. Enable optional features with decorator parameters. Pass frozen=True for immutable, hashable instances. Pass order=True for all four comparison operators. Pass slots=True (Python 3.10+) to reduce memory usage by eliminating the per-instance __dict__.

The Boilerplate Problem

Consider a class that represents a book in a collection. Without any shortcuts, the manual approach requires writing every dunder method by hand:

class Book:
    def __init__(self, title, author, pages, isbn):
        self.title = title
        self.author = author
        self.pages = pages
        self.isbn = isbn

    def __repr__(self):
        return (
            f"Book(title={self.title!r}, author={self.author!r}, "
            f"pages={self.pages!r}, isbn={self.isbn!r})"
        )

    def __eq__(self, other):
        if not isinstance(other, Book):
            return NotImplemented
        return (
            self.title == other.title
            and self.author == other.author
            and self.pages == other.pages
            and self.isbn == other.isbn
        )


b1 = Book("Network Security", "Kandi Brian", 420, "978-0-13-468599-1")
b2 = Book("Network Security", "Kandi Brian", 420, "978-0-13-468599-1")

print(b1)        # Book(title='Network Security', author='Kandi Brian', ...)
print(b1 == b2)  # True

This class has four fields, and the boilerplate already spans 18 lines. Each field name appears four times: once in the __init__ signature, once in the self.field = field assignment, once in __repr__, and once in __eq__. Adding a fifth field means editing three methods. Removing a field means editing three methods. Every change is a chance for a typo to go unnoticed. PEP 557 identified exactly this class of problem: the Python community had long needed a cleaner way to define classes built primarily around data fields accessible by name. Data Classes are the standard library's answer to it.

What @dataclass Generates

The same class written with @dataclass requires only the field declarations. The decorator generates __init__, __repr__, and __eq__ from the type annotations:

from dataclasses import dataclass


@dataclass
class Book:
    title: str
    author: str
    pages: int
    isbn: str


b1 = Book("Network Security", "Kandi Brian", 420, "978-0-13-468599-1")
b2 = Book("Network Security", "Kandi Brian", 420, "978-0-13-468599-1")

print(b1)        # Book(title='Network Security', author='Kandi Brian', pages=420, isbn='978-0-13-468599-1')
print(b1 == b2)  # True

Seven lines replace eighteen. Each field name appears once. Adding or removing a field means changing a single line, and the generated methods automatically adjust. The type annotations (str, int) are not enforced at runtime; they serve as documentation for readers and type checkers like mypy and pyright.

Note

The @dataclass decorator does not create a new class. It modifies the existing class by adding methods to it and returns the same class object. This is different from function decorators, which typically return a new wrapper function.

The methods are generated once, at class definition time — not each time you create an instance. This means the overhead is paid when your module first imports, not in a hot loop constructing thousands of objects. Inspect the class after decoration and you will find real __init__, __repr__, and __eq__ attributes attached directly to it, indistinguishable from methods you had written yourself.

Generated __init__ in Detail

The generated __init__ for the Book class above is equivalent to:

# This is what @dataclass generates behind the scenes:
def __init__(self, title: str, author: str, pages: int, isbn: str):
    self.title = title
    self.author = author
    self.pages = pages
    self.isbn = isbn

The parameters appear in the same order as the field declarations in the class body. Fields with default values become parameters with default values. The generated method is a real method on the class, indistinguishable from one you wrote by hand.

Defaults and field()

Simple default values work the same way they do in regular function signatures. Fields without defaults must come before fields with defaults:

from dataclasses import dataclass


@dataclass
class Server:
    hostname: str
    ip_address: str
    port: int = 443
    protocol: str = "HTTPS"


web = Server("web-01", "10.0.1.50")
print(web)
# Server(hostname='web-01', ip_address='10.0.1.50', port=443, protocol='HTTPS')

custom = Server("api-01", "10.0.1.60", port=8080, protocol="HTTP")
print(custom)
# Server(hostname='api-01', ip_address='10.0.1.60', port=8080, protocol='HTTP')

For mutable default values like lists or dictionaries, Python raises a ValueError if you assign them directly. This is a safety measure to prevent shared mutable state between instances. The field() function provides default_factory to handle this:

Why does @dataclass raise here, but a normal class does not?

In a plain class, writing tags: dict = {} is legal Python — it assigns one shared dictionary to a class variable. Every instance that reads self.tags without first setting it gets the same object. That shared state produces bugs that are notoriously hard to trace: one instance silently mutates data that another reads.

@dataclass detects this pattern at class-definition time and raises ValueError rather than letting the shared-state bug through. The default_factory parameter is the safe alternative — it calls the callable once per construction, so each instance starts with its own fresh object.

# list[str] and dict[str, str] built-in generics require Python 3.9+
# For Python 3.7/3.8 use: from typing import List, Dict
from dataclasses import dataclass, field


@dataclass
class FirewallRule:
    name: str
    action: str
    source_ips: list[str] = field(default_factory=list)
    tags: dict[str, str] = field(default_factory=dict)


rule1 = FirewallRule("allow-ssh", "ALLOW")
rule1.source_ips.append("10.0.1.0/24")

rule2 = FirewallRule("block-telnet", "DENY")

# Each instance has its own list, not a shared one
print(rule1.source_ips)  # ['10.0.1.0/24']
print(rule2.source_ips)  # []

Writing this manually would require the same if tags is None: tags = {} pattern in __init__, which is easy to forget and produces bugs when it is missing. The field(default_factory=list) syntax makes the intent explicit and eliminates the risk of shared mutable state.

Controlling repr, comparison, and init per field

Beyond defaults, field() accepts parameters that control whether a field participates in the generated __repr__, __eq__, and __init__. This matters whenever a field holds sensitive data, an internal implementation detail, or a value that should be derived rather than supplied by the caller:

from dataclasses import dataclass, field


@dataclass
class UserAccount:
    username: str
    email: str
    # repr=False keeps the password out of print() and log output
    password_hash: str = field(repr=False)
    # compare=False means two accounts are equal if username and email match,
    # regardless of when the account was created
    created_at: str = field(default="", compare=False)


a1 = UserAccount("kandi", "[email protected]", "abc123hash")
a2 = UserAccount("kandi", "[email protected]", "abc123hash", created_at="2026-01-01")

print(a1)
# UserAccount(username='kandi', email='[email protected]', created_at='')
# password_hash does not appear in the output

print(a1 == a2)  # True — created_at is excluded from comparison

The init=False parameter removes a field from the generated __init__ entirely. Combined with __post_init__, this is the standard pattern for fields whose values are derived from other fields at construction time rather than passed by the caller:

from dataclasses import dataclass, field


@dataclass
class Rectangle:
    width: float
    height: float
    # area is never passed in — it is always computed from width and height
    area: float = field(init=False, repr=True)

    def __post_init__(self):
        self.area = self.width * self.height


r = Rectangle(4.0, 6.0)
print(r)          # Rectangle(width=4.0, height=6.0, area=24.0)
print(r.area)     # 24.0

In a manual class, excluding a field from __repr__ means writing the __repr__ string by hand and simply omitting it. Excluding a field from __eq__ means writing the comparison by hand. Computing a derived field means adding lines to __init__ after all the assignments. With field(), each of these intentions is expressed as a parameter on the relevant field declaration, keeping the control co-located with the field itself.

One precision worth noting from the official docs: the hash parameter on field() defaults to None, which means it follows the value of compare. Setting compare=False therefore also excludes a field from the generated __hash__ unless you explicitly set hash=True to override that. The field() function also accepts a metadata parameter — a read-only mapping that dataclasses itself ignores but that third-party libraries such as Pydantic and marshmallow-dataclass use to attach validation rules, serialisation hints, and schema information directly to the field declaration.

Check Your Understanding
Question 1 of 3
You write tags: list[str] = [] as a field on a dataclass. What happens?

Validation with __post_init__

The auto-generated __init__ assigns field values but does not validate them. The __post_init__ method runs immediately after __init__ completes, giving you a hook to enforce constraints without writing a custom __init__:

# list[str] built-in generic requires Python 3.9+
from dataclasses import dataclass, field


@dataclass
class Subnet:
    cidr: str
    vlan_id: int
    description: str = ""
    hosts: list[str] = field(default_factory=list)

    def __post_init__(self):
        if not 1 <= self.vlan_id <= 4094:
            raise ValueError(
                f"VLAN ID must be 1-4094, got {self.vlan_id}"
            )
        if "/" not in self.cidr:
            raise ValueError(
                f"CIDR must contain '/', got {self.cidr!r}"
            )


valid = Subnet("10.0.1.0/24", 100, "Production LAN")
print(valid)
# Subnet(cidr='10.0.1.0/24', vlan_id=100, description='Production LAN', hosts=[])

try:
    bad_vlan = Subnet("10.0.2.0/24", 5000)
except ValueError as e:
    print(f"Rejected: {e}")
# Rejected: VLAN ID must be 1-4094, got 5000

In a manual class, this validation would live inside __init__ itself, interleaved with the self.field = field assignments. With @dataclass, the assignment boilerplate is handled automatically, and __post_init__ contains only the validation logic. This separation makes the validation easier to find and easier to maintain.

InitVar: parameters that reach __post_init__ but are never stored

The init=False pattern shown in section 3 handles fields whose values are derived from other fields. There is a complementary pattern for values that should be accepted by __init__ and forwarded to __post_init__, but never stored as fields at all. The InitVar[T] type annotation (from the dataclasses module) marks a variable as init-only. It is passed through to __post_init__ but does not become an instance attribute and is not returned by dataclasses.fields():

from dataclasses import dataclass, field, InitVar


@dataclass
class HashedPassword:
    username: str
    # raw_password is passed to __init__ and __post_init__, but never stored
    raw_password: InitVar[str]
    password_hash: str = field(init=False, repr=False)

    def __post_init__(self, raw_password: str):
        import hashlib
        self.password_hash = hashlib.sha256(raw_password.encode()).hexdigest()


account = HashedPassword("kandi", "s3cr3t!")
print(account.username)       # kandi
print(account.password_hash)  # sha256 hex digest
# account.raw_password        # AttributeError -- never stored

The raw password goes into __init__, is immediately hashed in __post_init__, and is then gone — it never appears in __repr__, is never part of equality comparisons, and cannot be retrieved after construction. This pattern is documented in PEP 557 under “Init-only variables” and is one of the least-known features of the module.

asdict(), astuple(), and replace()

The dataclasses module ships three helper functions that work on any dataclass instance and that manual classes do not provide for free.

dataclasses.asdict(obj) recursively converts a dataclass to a plain dictionary, which is the standard pattern for JSON serialisation. dataclasses.astuple(obj) converts it to a tuple. Both recurse into nested dataclasses, dicts, lists, and tuples, and copy other objects with copy.deepcopy().

dataclasses.replace(obj, **changes) creates a new instance of the same dataclass with specified fields replaced. This is the standard pattern for working with frozen dataclasses — since you cannot mutate a frozen instance, replace() gives you a modified copy:

from dataclasses import dataclass, replace, asdict


@dataclass(frozen=True)
class Config:
    host: str
    port: int = 443
    debug: bool = False


prod = Config("api.example.com")
dev  = replace(prod, host="localhost", port=8080, debug=True)

print(prod)  # Config(host='api.example.com', port=443, debug=False)
print(dev)   # Config(host='localhost', port=8080, debug=True)

# asdict for serialisation
import json
print(json.dumps(asdict(prod)))
# {"host": "api.example.com", "port": 443, "debug": false}

Note that replace() calls __init__ under the hood, which means __post_init__ also runs on the new instance. This is by design — any validation in __post_init__ applies equally to the copy. Passing a field that has init=False to replace() raises ValueError, since that field has no __init__ parameter to receive it.

Spot the Bug
This dataclass stores a network alert with a unique ID derived from the current timestamp. It runs without error but every instance gets the same ID. What is the bug?
from dataclasses import dataclass, field import time   @dataclass class Alert: hostname: str severity: int alert_id: str = field(default=f"alert-{time.time()}", repr=True)   a1 = Alert("web-01", 3) a2 = Alert("db-01", 5) print(a1.alert_id) # alert-1743000000.123 print(a2.alert_id) # alert-1743000000.123 <-- identical
Which of these identifies the root cause?

Frozen, Slots, and Advanced Parameters

The @dataclass decorator accepts parameters that enable features which would require significant manual code. The three parameters that have the largest impact on code reduction are frozen, slots, and order.

frozen=True: Immutable Instances

# FrozenInstanceError is directly importable from Python 3.11+
# On 3.7–3.10 it is raised as an AttributeError subclass
from dataclasses import dataclass, FrozenInstanceError


@dataclass(frozen=True)
class Coordinate:
    latitude: float
    longitude: float


point = Coordinate(29.7604, -95.3698)

try:
    point.latitude = 0.0
except FrozenInstanceError as e:
    print(f"Blocked: {e}")
# Blocked: cannot assign to field 'latitude'

# Frozen dataclasses are hashable, so they can be dictionary keys
locations = {point: "Houston, TX"}
print(locations[Coordinate(29.7604, -95.3698)])
# Houston, TX

Writing this manually would require a custom __setattr__ that raises an error, a custom __delattr__ that raises an error, and a custom __hash__ that computes a hash from the fields. With frozen=True, the decorator generates all three.

slots=True: Memory Efficiency (Python 3.10+)

from dataclasses import dataclass
import tracemalloc


@dataclass
class RegularPoint:
    x: float
    y: float


@dataclass(slots=True)
class SlottedPoint:
    x: float
    y: float


# sys.getsizeof() measures only the object header and misses the __dict__
# overhead, so it gives misleading results for this comparison.
# Use tracemalloc to measure actual allocation cost across many instances.

tracemalloc.start()
regular_instances = [RegularPoint(float(i), float(i)) for i in range(100_000)]
_, regular_peak = tracemalloc.get_traced_memory()
tracemalloc.stop()

tracemalloc.start()
slotted_instances = [SlottedPoint(float(i), float(i)) for i in range(100_000)]
_, slotted_peak = tracemalloc.get_traced_memory()
tracemalloc.stop()

print(f"Regular peak: {regular_peak / 1024 / 1024:.1f} MB")
print(f"Slotted peak: {slotted_peak / 1024 / 1024:.1f} MB")
# Regular peak: ~10.4 MB
# Slotted peak:  ~7.6 MB  (roughly 25-40% less)

print(hasattr(RegularPoint(0.0, 0.0), "__dict__"))  # True
print(hasattr(SlottedPoint(0.0, 0.0), "__dict__"))  # False

The slots=True parameter generates a __slots__ declaration that prevents the creation of a per-instance __dict__. The savings are not visible through sys.getsizeof() on a single instance because that function measures only the object header and misses the dictionary overhead entirely. The difference shows up when allocating many instances together, where the __dict__ overhead compounds. Without the decorator, you would need to manually declare __slots__ and ensure it matches your fields exactly. Because implementing this required returning a new class rather than modifying the original, PEP 557 author Eric V. Smith intentionally held it back from Python 3.7 and added it only in Python 3.10 once the demand was clear enough to justify bending the “no new class” design principle.

order=True: Comparison Operators

from dataclasses import dataclass


@dataclass(order=True)
class Severity:
    level: int
    name: str


low = Severity(1, "LOW")
medium = Severity(2, "MEDIUM")
high = Severity(3, "HIGH")
critical = Severity(4, "CRITICAL")

alerts = [critical, low, high, medium]
alerts.sort()

for alert in alerts:
    print(f"  {alert.name} (level {alert.level})")
# LOW (level 1)
# MEDIUM (level 2)
# HIGH (level 3)
# CRITICAL (level 4)

With order=True, the decorator generates __lt__, __le__, __gt__, and __ge__. These compare instances by their fields in declaration order. Writing all four comparison methods manually is tedious and error-prone, and the decorator handles it with a single parameter.

__init__Write by handAuto-generated from annotations
Manual ClassWrite by hand
@dataclassAuto-generated from annotations
__repr__Write by handAuto-generated
Manual ClassWrite by hand
@dataclassAuto-generated
__eq__Write by handAuto-generated
Manual ClassWrite by hand
@dataclassAuto-generated
Ordering (<, >, etc.)Write 4 methods by handorder=True
Manual ClassWrite 4 methods by hand
@dataclassorder=True
ImmutabilityCustom __setattr__ + __hash__frozen=True
Manual ClassCustom __setattr__ + __hash__
@dataclassfrozen=True
Memory optimizationManual __slots__ declarationslots=True (3.10+)
Manual ClassManual __slots__ declaration
@dataclassslots=True (3.10+)
Mutable defaultsNone-check pattern in __init__field(default_factory=list)
Manual ClassNone-check pattern in __init__
@dataclassfield(default_factory=list)
Post-init validationInside __init__Separate __post_init__ method
Manual ClassInside __init__
@dataclassSeparate __post_init__ method
The unifying mental model

Every row in that table represents code you would write identically in class after class — the field names change, the logic never does. That is the exact criterion for a decorator: a mechanically predictable transformation that adds no domain-specific reasoning. @dataclass does not make decisions; it executes a fixed recipe. When a class needs decisions — custom equality logic, resource acquisition, field-order overrides — you write those yourself. The decorator fills the rest.

Dataclass Inheritance

A dataclass can inherit from another dataclass. The child class gains all parent fields, and those fields appear first in the generated __init__, preserving declaration order. The child can add new fields and its own __post_init__ alongside the parent’s:

from dataclasses import dataclass


@dataclass
class Asset:
    hostname: str
    ip_address: str
    owner: str = "unassigned"


@dataclass
class Server(Asset):
    os: str = "Linux"
    cpu_cores: int = 4


@dataclass
class WebServer(Server):
    port: int = 443
    tls_enabled: bool = True


web = WebServer("web-01", "10.0.1.10", owner="ops-team", port=8443)
print(web)
# WebServer(hostname='web-01', ip_address='10.0.1.10', owner='ops-team',
#           os='Linux', cpu_cores=4, port=8443, tls_enabled=True)

# Equality compares all fields across the full inheritance chain
clone = WebServer("web-01", "10.0.1.10", owner="ops-team", port=8443)
print(web == clone)  # True

One constraint to be aware of: if a parent dataclass has a field with a default value, all fields in every child must also have default values. This is the same rule that applies to regular function signatures — non-default arguments cannot follow default ones. If you try to add a required field to a child of a parent that has optional fields, Python raises a TypeError at class definition time.

from dataclasses import dataclass


@dataclass
class Base:
    name: str
    tag: str = "default"   # has a default


# This raises TypeError: non-default argument 'priority' follows default argument
# @dataclass
# class Child(Base):
#     priority: int         # no default, but Base.tag has one

# Fix: give the child field a default too, or use kw_only=True (Python 3.10+)
@dataclass(kw_only=True)
class Child(Base):
    priority: int           # required, but keyword-only avoids the ordering conflict

c = Child(name="alert", priority=1)
print(c)  # Child(name='alert', tag='default', priority=1)

The kw_only=True decorator parameter (Python 3.10+) resolves the ordering conflict by making all child fields keyword-only in the generated __init__. It can also be applied to individual fields via field(kw_only=True) when you only need to restrict specific fields rather than the entire class.

When to Use a Manual Class Instead

The @dataclass decorator is designed for classes that primarily store data. There are scenarios where a manually written class is the better choice.

Complex initialization logic. If __init__ needs to open files, establish network connections, allocate resources, or perform multi-step setup that goes beyond assigning field values, that logic belongs in a hand-written __init__. While __post_init__ can handle validation, it is not intended for heavyweight resource acquisition.

Behavior-oriented classes. Classes that exist primarily to encapsulate behavior (methods) rather than to store structured data do not benefit from @dataclass. The distinction between data-holding and behavior-holding classes is covered in detail in the guide to composition vs inheritance. A class that represents a database connection pool or a thread manager has methods as its primary interface, not fields.

Custom equality semantics. If __eq__ should compare only a subset of fields or use different logic than field-by-field comparison, the auto-generated __eq__ will not work correctly. You can set eq=False and write your own, but at that point you have lost one of the main benefits of the decorator.

# A behavior-oriented class: manual __init__ is appropriate here
class DatabasePool:
    def __init__(self, connection_string, max_connections=10):
        self._connection_string = connection_string
        self._max_connections = max_connections
        self._pool = []
        self._initialize_pool()

    def _initialize_pool(self):
        """Create initial connections."""
        for _ in range(self._max_connections):
            self._pool.append(self._create_connection())

    def _create_connection(self):
        """Simulate creating a database connection."""
        return {"connection": self._connection_string, "active": True}

    def acquire(self):
        """Get a connection from the pool."""
        if not self._pool:
            raise RuntimeError("No connections available")
        return self._pool.pop()

    def release(self, conn):
        """Return a connection to the pool."""
        self._pool.append(conn)

This class creates connections during initialization, manages a mutable pool, and exposes acquire/release as its primary interface. Converting this to a @dataclass would be awkward because the initialization logic is the point of the class, not a side effect of field assignment.

Pro Tip

A good rule of thumb: if you find yourself describing the class by its fields ("it has a name, an IP, and a port"), use @dataclass. If you describe it by its behavior ("it manages connections" or "it processes events"), use a manual class.

@dataclass vs NamedTuple

Before @dataclass existed, typing.NamedTuple was the standard way to define a lightweight, readable data container in Python. Both solve the boilerplate problem, but they make different trade-offs that affect which one belongs in a given situation.

from typing import NamedTuple
from dataclasses import dataclass


class PointNT(NamedTuple):
    x: float
    y: float


@dataclass
class PointDC:
    x: float
    y: float


nt = PointNT(1.0, 2.0)
dc = PointDC(1.0, 2.0)

# NamedTuple instances are tuples under the hood
print(isinstance(nt, tuple))   # True
print(nt[0])                   # 1.0  — supports index access
print(len(nt))                 # 2

# Dataclass instances are plain class instances
print(isinstance(dc, tuple))   # False
# dc[0]                        # TypeError — no index access

# NamedTuple instances are immutable by default
# nt.x = 9.0                   # AttributeError

# Dataclass instances are mutable by default
dc.x = 9.0                     # fine

# Equality: NamedTuple compares equal to a plain tuple with the same values
print(nt == (1.0, 2.0))        # True  — may be surprising
print(dc == (1.0, 2.0))        # False — dataclass only equals another PointDC

The tuple equality behaviour of NamedTuple is the sharpest practical difference. A NamedTuple instance will compare equal to any tuple whose values match, regardless of type. That can produce silent bugs when mixing named and unnamed tuples in collections or comparisons. A dataclass only compares equal to another instance of the same class, which is the expected behaviour for a domain object.

The table below maps out the full set of trade-offs:

MutabilityImmutable by defaultMutable by default; frozen=True for immutability
NamedTupleImmutable by default
@dataclassMutable by default; frozen=True for immutability
Is a tupleYes — supports indexing, unpacking, len()No — plain class instance
NamedTupleYes — supports indexing, unpacking, len()
@dataclassNo — plain class instance
Equality with plain tupleYes — Point(1, 2) == (1, 2) is TrueNo — only equal to same type
NamedTupleYes — Point(1, 2) == (1, 2) is True
@dataclassNo — only equal to same type
InheritanceLimited — adding fields in subclasses is not supportedFull support with field ordering rules
NamedTupleLimited — adding fields in subclasses is not supported
@dataclassFull support with field ordering rules
Derived / computed fieldsNot possible without a workaroundfield(init=False) + __post_init__
NamedTupleNot possible without a workaround
@dataclassfield(init=False) + __post_init__
Memory layoutTuple — compact, no __dict__Normal class; use slots=True to remove __dict__
NamedTupleTuple — compact, no __dict__
@dataclassNormal class; use slots=True to remove __dict__
Hashable by defaultYes — tuples are hashable if all values areNo — unless frozen=True
NamedTupleYes — tuples are hashable if all values are
@dataclassNo — unless frozen=True

Choose NamedTuple when you need tuple unpacking, tuple-compatible APIs (functions that expect a sequence), or a guaranteed-immutable record with no derived fields. Choose @dataclass when the object needs to be mutable, when you need inheritance, when some fields are derived from others, or when accidental equality with a plain tuple would be a hazard.

Frequently Asked Questions

What does the @dataclass decorator generate automatically?

The @dataclass decorator automatically generates __init__, __repr__, and __eq__ methods based on the class's type-annotated fields. With additional parameters it can also generate __lt__, __le__, __gt__, __ge__ (via order=True), __hash__ (via frozen=True), and __slots__ (via slots=True in Python 3.10+). The decorator is documented in PEP 557 and was added to the standard library in Python 3.7.

Can I still add custom methods to a dataclass?

Yes. A dataclass is a regular Python class. You can add any custom methods, properties, class methods, or static methods alongside the auto-generated ones. The @dataclass decorator only generates the boilerplate methods; it does not restrict what else you add to the class body.

How do I validate fields in a dataclass?

Use the __post_init__ method. It runs immediately after the auto-generated __init__ completes, giving you access to all field values for validation. Raise ValueError or TypeError inside __post_init__ to reject invalid data. For init-only parameters that should not be stored as fields, use InitVar[T] type annotations, which pass the value to __post_init__ without creating an instance attribute.

What is the difference between frozen=True and a manual class with read-only properties?

frozen=True generates a __setattr__ that raises FrozenInstanceError on any attribute assignment after initialization, along with __delattr__ and __hash__. This applies to all fields uniformly with a single decorator parameter. A manual class requires writing individual @property getters without setters for each field, which is significantly more code. Frozen dataclasses also support dataclasses.replace() for creating modified copies.

When should I use a manual class instead of @dataclass?

Use a manual class when the class is primarily behavior-oriented rather than data-oriented, when __init__ requires complex logic such as opening files or establishing connections, when you need fine-grained control over __eq__ that does not compare all fields, or when the class uses metaclasses or descriptors that conflict with the decorator. The decorator is not intended as a replacement for libraries like attrs that offer different feature sets.

What is dataclasses.replace() and when should I use it?

dataclasses.replace(obj, **changes) creates a new instance of the same dataclass type with specified fields replaced. It is the standard pattern for working with frozen dataclasses, since frozen instances cannot be mutated directly. replace() calls __init__ under the hood, which means __post_init__ validation also runs on the copy. Passing a field with init=False to replace() raises ValueError.

What is the difference between @dataclass and NamedTuple?

Both eliminate __init__ boilerplate for data containers but differ in several important ways. NamedTuple instances are actual tuples: they support index access, unpacking, and len(), and they compare equal to plain tuples with matching values. Dataclass instances are plain class objects that only compare equal to other instances of the same class. NamedTuples are immutable by default; dataclasses are mutable by default. Dataclasses support inheritance and computed fields via field(init=False); NamedTuples do not. Choose NamedTuple when tuple semantics are needed and @dataclass for everything else.

What is InitVar in Python dataclasses?

InitVar[T] is a type annotation from the dataclasses module that marks a variable as init-only. Fields annotated with InitVar are accepted by the generated __init__ and forwarded to __post_init__, but they are never stored as instance attributes and are not returned by dataclasses.fields(). This is useful for constructor parameters like passwords or configuration tokens that should be processed but not retained.

Can a dataclass inherit from another dataclass?

Yes. A child dataclass inherits all fields from its parent, and those fields appear first in the generated __init__. The child can add new fields and its own __post_init__. One constraint: if the parent has any field with a default value, all fields in the child must also have defaults. Use kw_only=True on the child class (Python 3.10+) to resolve this ordering conflict when you need a required field in a child of a parent with optional fields.

How do I exclude a field from __repr__ or __eq__ in a dataclass?

Use the field() function with repr=False to exclude a field from the generated __repr__, or compare=False to exclude it from __eq__ and ordering comparisons. Note that compare=False also affects __hash__ by default, since the hash parameter follows compare unless overridden explicitly. For a field that should not appear in __init__ at all, use init=False and assign its value in __post_init__. For a parameter that should pass through __init__ to __post_init__ but never be stored, use InitVar[T].

Key Takeaways

  1. @dataclass eliminates repetitive boilerplate. It auto-generates __init__, __repr__, and __eq__ from type-annotated fields. Each field name appears once instead of four times across three methods.
  2. Use field(default_factory=...) for mutable defaults. Lists, dictionaries, and sets cannot be assigned directly as default values. The field() function provides a factory that creates a new instance for each object, preventing shared mutable state.
  3. __post_init__ handles validation separately from assignment. It runs after the auto-generated __init__ completes and can raise exceptions for invalid field values. This separates "set the fields" from "check the fields."
  4. Decorator parameters unlock advanced features. frozen=True makes instances immutable and hashable. slots=True (Python 3.10+) reduces memory usage. order=True generates all four comparison operators. Each parameter replaces multiple manually written methods.
  5. Use manual classes for behavior-oriented or resource-managing code. When __init__ needs complex setup logic, when the class exists primarily for its methods rather than its fields, or when equality needs custom semantics, a hand-written class provides the control that @dataclass intentionally abstracts away.
  6. Use field() parameters to control individual fields. repr=False excludes sensitive data from output. compare=False removes a field from equality checks. init=False with __post_init__ handles computed fields whose values derive from other fields.
  7. Dataclasses support full inheritance; NamedTuples do not. Child dataclasses inherit parent fields in declaration order. Use kw_only=True (Python 3.10+) to resolve ordering conflicts when a parent has optional fields and a child needs required ones. Prefer NamedTuple only when tuple semantics — indexing, unpacking, or tuple-compatible equality — are genuinely needed.

The @dataclass decorator is one of the clearest examples of a class decorator providing tangible, measurable value. It removes the code that programmers write identically in class after class and replaces it with a single line that communicates intent: this class stores data. The fields declare what the data is. The decorator handles everything else. For classes where that description fits, the reduction in boilerplate is not just a convenience — it is a reduction in the surface area where bugs can hide. For more Python tutorials covering decorators, type hints, and modern class patterns, the full library is available on PythonCodeCrack.