Dash0 Raises $110M Series B at $1B Valuation

  • 12 min read

Postgres CREATE TABLE: Syntax, Examples, and Common Options

CREATE TABLE is where most database mistakes happen. Not because the syntax is tricky, but because decisions you make here tend to calcify: changing a column's data type on a table with 50 million rows is a lot less fun than getting it right the first time.

This article covers the full syntax, the constraints worth knowing, and the pitfalls that will bite you later if you skip them now.

The basic syntax

Here's a table you could actually ship to production:

sql
12345678
CREATE TABLE orders (
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
customer_id bigint NOT NULL REFERENCES customers (id),
status varchar(20) NOT NULL DEFAULT 'pending',
total numeric(12, 2) NOT NULL CHECK (total >= 0),
created_at timestamptz NOT NULL DEFAULT now(),
updated_at timestamptz
);

The column definitions follow a straightforward pattern: name, type, constraints. The sections below explain the choices in that example.

Choosing the right data types

Integers. You have smallint (2 bytes), integer (4 bytes), and bigint (8 bytes). Use bigint for primary keys, integer tops out at ~2 billion, which sounds like a lot until your events table is generating 10 million rows a day. For foreign keys, integer is fine as long as you're confident the referenced table won't outgrow it.

Exact decimals. numeric(precision, scale) is exact. real and double precision are floating-point and will accumulate rounding errors. Use numeric for money. Don't use floating-point for money. This is not a nuanced take.

Strings. text is the right default in PostgreSQL. varchar(n) makes sense when you need a database-level length cap, status codes, country codes, that kind of thing. char(n) pads shorter values with spaces and is almost never what you want. One thing worth calling out: varchar(255) is a MySQL artifact. That number carries no special meaning in PostgreSQL. If you're writing it out of habit, text is simpler.

Timestamps. Use timestamptz, not timestamp. PostgreSQL stores timestamptz as UTC and converts to the session timezone on read. Plain timestamp has no timezone information, it stores whatever value you give it. If your inserts come from servers in multiple regions, or if you ever change a system timezone, values stored as timestamp become ambiguous in ways that are painful to untangle later.

Booleans. boolean. Not smallint, not char(1). PostgreSQL accepts true, false, 't', 'f', 'yes', 'no', 1, and 0 on input, so there's no need to simulate it.

Constraints that actually enforce your data model

Adding a constraint to a large existing table takes a full scan and often a long lock. The time to add them is now.

NOT NULL is the most skipped constraint and the one you'll miss most. If a column should always have a value, say so at the schema level. "We validate in the application" sounds reasonable until a migration script, bulk import, or raw psql session goes around the application. The constraint is the only thing that's always enforced.

PRIMARY KEY implies NOT NULL and UNIQUE. Every table should have one. If you don't define one, PostgreSQL won't stop you, but logical replication and ON CONFLICT ... DO UPDATE both need a primary key to function correctly.

UNIQUE prevents duplicate values. You can put it on a single column or define it as a table-level constraint across multiple columns:

sql
123456
CREATE TABLE memberships (
user_id bigint NOT NULL REFERENCES users (id),
org_id bigint NOT NULL REFERENCES organizations (id),
role varchar(20) NOT NULL DEFAULT 'member',
UNIQUE (user_id, org_id)
);

CHECK enforces any boolean expression. Good for ruling out states that should never exist:

sql
123
created_at timestamptz NOT NULL,
expires_at timestamptz NOT NULL,
CHECK (expires_at > created_at)

FOREIGN KEY (written as REFERENCES other_table (column)) links rows in this table to rows in another. If you omit ON DELETE, the default is NO ACTION, which rejects deletes that would leave orphaned rows. For non-deferred constraints this behaves the same as RESTRICT; the difference only matters if you're using DEFERRABLE INITIALLY DEFERRED. Make it explicit anyway, ON DELETE NO ACTION communicates intent in a way that a missing clause doesn't.

IF NOT EXISTS

Add this whenever the statement runs in a migration or init script that might execute more than once:

sql
12345
CREATE TABLE IF NOT EXISTS feature_flags (
name text PRIMARY KEY,
enabled boolean NOT NULL DEFAULT false,
created_at timestamptz NOT NULL DEFAULT now()
);

Without IF NOT EXISTS, the statement fails with an error if the table exists. With it, PostgreSQL issues a notice and continues. Small thing, saves debugging time.

The SERIAL pitfall

If you've written PostgreSQL for a while, you've probably used SERIAL:

sql
12345
-- Don't do this for new schemas
CREATE TABLE events (
id serial PRIMARY KEY,
...
);

SERIAL still works, but it has a real problem. It's not actually a data type, it's syntactic sugar that creates a sequence, sets a column default, and links them loosely together. The sequence ends up with its own separate ACL entry, not covered by grants on the table. So if you run GRANT INSERT ON events TO app_user, that user gets a permission denied for sequence events_id_seq error on the first insert. This is a fun one to debug at 2am.

Since PostgreSQL 10, the correct alternative is GENERATED ALWAYS AS IDENTITY:

sql
1234
CREATE TABLE events (
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
created_at timestamptz NOT NULL DEFAULT now()
);

Here the sequence is an internal dependency of the column. Table grants cover it. Dropping the column drops the sequence. Copying the table schema creates a new independent sequence rather than sharing the original's. It's SQL standard, and it's what SERIAL should have been.

One variant to know: if you need to insert a specific id during a data migration, use GENERATED BY DEFAULT AS IDENTITY. The ALWAYS form rejects explicit values unless you add OVERRIDING SYSTEM VALUE, which is intentionally inconvenient.

sql
12345
-- Allows explicit id values; useful when migrating data from another system
CREATE TABLE events (
id bigint GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
created_at timestamptz NOT NULL DEFAULT now()
);

Switch back to GENERATED ALWAYS once the migration is done.

Temporary tables

TEMP or TEMPORARY creates a table that exists only for the current session:

sql
1234
CREATE TEMP TABLE staging_import (
raw_line text,
row_number bigint GENERATED ALWAYS AS IDENTITY
);

Temp tables live in a session-private schema, invisible to other connections. They're good for ETL staging, breaking up complex queries, or any intermediate result you don't want to persist. One thing to watch: autovacuum can't touch temp tables, so if you load bulk data and then run complex joins, run ANALYZE staging_import first.

Creating a table from an existing one

Copy structure only with LIKE:

sql
1
CREATE TABLE orders_archive (LIKE orders INCLUDING ALL);

INCLUDING ALL copies column definitions, defaults, constraints, and indexes. Identity columns get new independent sequences, which is the correct behavior. If you do this with SERIAL columns, the copy shares the original sequence, another reason to prefer identity columns.

Copy structure and data with CREATE TABLE AS:

sql
123
CREATE TABLE orders_2025 AS
SELECT * FROM orders
WHERE created_at < '2026-01-01';

This gives you a populated table, but without indexes, constraints, or identity properties. Add those after the fact if you need them.

Common pitfalls

The first two on this list are responsible for the most production pain. The rest are real but recoverable.

timestamp instead of timestamptz. Your servers are probably not all in the same timezone as your database. Even if they are now, they might not be in a year. Store timestamptz; convert to display time in the application.

Missing NOT NULL. "We validate in the application" is not an excuse. Migrations, ETL scripts, and direct psql connections don't know about your application's validation layer. Constraints belong in the database.

varchar(255) as a default. This number comes from MySQL's historical storage behavior and means nothing in PostgreSQL. If you need a length cap, pick one that reflects the actual data. If you don't need a cap, use text.

SERIAL on a shared database. The sequence permission issue is easy to miss in development, where the connecting user is often the table owner. It surfaces in staging or production, where service accounts have narrower roles. Switch to identity columns.

integer primary keys on high-volume tables. 2,147,483,647 rows sounds like enough. Events, logs, and audit records can get there faster than you'd expect. Use bigint.

Final thoughts

Observability for your database layer

Slow queries are usually the first symptom after a schema problem, missing indexes, bad data types, large table scans. PostgreSQL surfaces query execution data through pg_stat_statements, and the OpenTelemetry database semantic conventions (db.query.text, db.operation.name, db.namespace) let you correlate application traces with what's actually happening at the database level. Dash0's guide to trace-aware PostgreSQL logging walks through how to set that pipeline up end-to-end.

Dash0's PostgreSQL integration puts slow queries, query throughput, and error rates next to the traces and logs from the services calling your database, so you're not jumping between tools to figure out what's slow and why. Start a free trial to see your PostgreSQL metrics, application traces, and logs in one place. No credit card required.