Skip to content

Art as codec

Every art form is a codec — a chosen tradeoff among Shannon bandwidth, semantic density, required priors, and level of generalization. The hierarchy of media (text → sound → image → embodied) is orthogonal to the hierarchy of abstraction (iconic → archetypal → abstract → conceptual). Shannon bits mislead because language and convention pre-compress meaning before the artwork starts; the operative yardstick is bits-of-insight per prepared receiver, not bits in the artifact.
🌱 seedling tended 2026-05-14 research art aesthetics information-theory compression codec hierarchy representation
flowchart LR
  sig[signal · raw bits] --> form[form · grammar]
  form --> sty[style · idiolect]
  sty --> work[work · instance]
  work --> theme[theme · referent]
  theme --> meta[meta-claim]
  meta -.-> sig
  ref[referent ladder · iconic → conceptual] -.orthogonal.- sig
  med[medium · text · sound · image · embodied] -.orthogonal.- sig
Read next

Synthesis. Anchored in Shannon (1948), Kolmogorov (1965), Peirce (icon/index/symbol), Cassirer (Philosophy of Symbolic Forms), Benjamin (aura, 1936), Birkhoff (Aesthetic Measure, 1933), Berlyne (psychobiology of aesthetics, 1971), and Schmidhuber (interestingness ≈ compression-progress, 1991–2009). Numeric estimates are order-of-magnitude. Rating: medium — frame is well-supported, the leaf claims about each medium's unique codomain are defensible but contested.

Status: seedling | 2026-05-14 | rating: medium Compress levels: L0 ↓ L1 ↓ L2

Every art form is a codec. The question is not "how much information is in the artwork" but "what slice of experience does this codec natively encode, and how much does the prepared receiver get back."

L0 — TL;DR (≤5 lines)

"How much information does art carry?" hides four different questions — Shannon (bits in the artifact), Kolmogorov (irreducible structure), semantic (what reaches a prepared receiver), and generalization level (how abstract the referent is). Each art form is a codec with a distinctive position on all four. Media (text · sound · image · embodied) and abstraction (iconic → archetypal → conceptual) are orthogonal — two axes of one map. The strongest claim is operational: every medium has a codomain it natively encodes, and good work picks the medium that matches its signal rather than fighting one that does not.

L1 — Overview

Core question

When we say a piece of art "represents" something, what does represent mean across media — paint, sound, language, body, space, code? How much information actually moves from artifact to receiver, and what does each medium uniquely carry that the others cannot? Where do hierarchies of generalization sit — are they layered inside each work, or across the art forms, or both?

Why it matters

  • Debates about "what is art" routinely collapse three different axes (medium, abstraction, transmission). Separating them turns vague arguments into testable claims.
  • For systems that generate or judge art (humans or models), the codec frame names what to optimize for — semantic bits per prepared receiver — not what is easy to count (pixel fidelity, FID).
  • For practitioners, the codec frame is a tool: find the medium whose codomain already contains your signal. Fighting a medium is paying compression overhead for nothing.
  • For receivers, it explains the asymmetry of preparation — bring more priors, receive more bits, full stop. The artwork has not changed; the channel just got wider.
  • For the broader COMPRESSIONS project, art is the oldest, most diverse codec catalog humans have. Studying it teaches the general theory.

Mermaid map (L1)

flowchart LR
  subgraph axes[four axes]
    sh[Shannon · bandwidth]
    k[Kolmogorov · structure]
    sem[Semantic · transmitted]
    gen[Generalization · abstraction]
  end

  subgraph media[medium hierarchy]
    txt[text · poetry · prose]
    snd[sound · music · speech]
    img[image · painting · photo · film]
    emb[embodied · dance · arch · ritual]
  end

  subgraph layers[layered in every work]
    L0s[signal]
    L1s[form · grammar]
    L2s[style · idiolect]
    L3s[work]
    L4s[theme]
    L5s[meta-claim]
    L0s --> L1s --> L2s --> L3s --> L4s --> L5s
  end

  axes -.measure.-> media
  axes -.measure.-> layers
  media -.host.-> layers

Skeleton sub-claims

  • "Information in art" is four quantities, not one — and they often point opposite directions (e.g. a Rothko: low Kolmogorov, high semantic resonance; a static field: high Kolmogorov, zero semantic).
  • The right unit is semantic bits per prepared receiver, not bits in the artifact.
  • Language is pre-compressed cognition. Linguistic art (poetry, prose) achieves the best semantic-to-raw ratio because it inherits natural language's pre-trained codebook.
  • Each art form has a codomain — a slice of experience it natively encodes — and other forms can only gesture at that slice from outside.
  • Hierarchy of media (channel structure) and hierarchy of generalization (referent abstraction) are orthogonal. Every work has a position on both.
  • Inside any work, there are ~6 stacked layers (signal → form → style → work → theme → meta-claim). Most works pack densely on 0–3 and point sparsely at 4–5.
  • The interestingness of a work tracks Schmidhuber's compressibility- progress: too random is noise, too patterned is wallpaper, the interesting work lives where the receiver's model is improving fastest.
  • The codec includes the audience. Same artifact + different priors = different transmission rate. Mechanical reproduction (Benjamin) and context (Duchamp) are codec changes, not metadata.

L2 — Deep dive

1. Four axes of "information"

Axis What it measures Maximised by Failure mode
Shannon Bits to encode the signal at chosen fidelity High-bandwidth media (4K film, VR, large sculpture) Confuses bandwidth with meaning — TV static is maximally Shannon-rich, zero semantic.
Kolmogorov Length of the shortest program that generates the artifact Genuinely irreducible structure Minimum-description-length is uncomputable; estimated by compressors. Glitch / random art scores high but for the wrong reason.
Semantic Information transmitted to a prepared receiver Codes the receiver already speaks (language, genre, iconography) Requires a model of the receiver. No transmission without prior. See REFLECTIONS-AND-RECEIVERS.
Generalization How abstract / categorical the referent is Higher in the iconic → conceptual ladder Not a scalar — direction (toward universals) more than magnitude.

A claim like "this painting contains a lot of information" is almost always covertly invoking one axis while sounding like it invokes another. The honest version: "this painting has high Shannon bandwidth and modest semantic transfer to most viewers but very high semantic transfer to viewers fluent in the relevant iconography."

2. Hierarchical taxonomy of art forms

Organised by channel structure (what the codec must encode), which is the most stable axis:

flowchart TB
  ART[ART · organised expression compressing experience]
  ART --> T[TIME-BASED · 1D evolution + content]
  ART --> S[STATIC · frozen, traversed by gaze]
  ART --> C[CONTINGENT · state-dependent, receiver in loop]

  T --> T1[auditory · music · sound art · radio drama]
  T --> T2[linguistic · spoken literature · poetry · oratory]
  T --> T3[visual-temporal · film · animation · video · kinetic]
  T --> T4[embodied · dance · theatre · performance · ritual]

  S --> S1[2D visual · painting · drawing · photo · print]
  S --> S2[3D visual · sculpture · ceramics · jewellery]
  S --> S3[linguistic-text · poetry-on-page · calligraphy · concrete]
  S --> S4[spatial · architecture · garden · installation]

  C --> C1[generative · algorithmic · system music · evolutionary]
  C --> C2[interactive · games · VR · participatory]
  C --> C3[conversational · improv · AI art · social practice]

Each leaf is a codec. Each codec has a distinctive joint signature on five sub-properties:

Sub-property What it is Examples of extremes
Bandwidth bits/sec or bits/artifact film ≫ haiku
Latency time to consume one unit architecture · seconds–hours; novel · ~10 hrs
Density semantic bits per raw bit poetry ≫ photograph
Required priors how much the audience must bring conceptual ≫ landscape photo
Reproducibility one-of-a-kind vs unlimited copies sculpture · 1; song · ∞ — Benjamin's aura

The taxonomy is intentionally shallow (~3 levels) because the interesting variation lives in the sub-property profile of each leaf, not in deeper nesting.

3. Information-density estimates

Order of magnitude only. "Semantic bits" is intentionally fuzzy — read it as "bits in the receiver's updated model after one consumption."

Art form Raw bits (compressed) Semantic bits transmitted Ratio
Haiku ~10² 1–10 (the turn) 10⁻¹
Lyric poem ~10³ ~10² 10⁻¹
Pop song (3 min) ~10⁷ ~10² (affective trajectory) 10⁻⁵
Novel ~10⁶–10⁷ ~10⁴ (mental model + scenes) 10⁻²
Painting (high-res scan) ~10⁷–10⁸ ~10²–10³ (subject + style impression) 10⁻⁵
Film (2 hr) ~10¹⁰ ~10⁵–10⁶ (story + sensory memory) 10⁻⁴
Architecture (a building) open mostly non-semantic — affordance, proprioception n/a
Conceptual artwork ~10² (the description) up to 10² (if the meta-frame lands) ~1

Two patterns:

  • The higher the bandwidth, the worse the ratio. Film has the richest joint channel but spends most of its bits on perceptual fidelity. The high-ratio forms (haiku, conceptual art) use almost no bits because they ride entirely on receiver priors.
  • Language is pre-compressed cognition. Poetry and prose can hit semantic ratios of 0.01–0.1 because natural language is itself the receiver's pre-trained codebook. Visual and auditory media must build their codebook in the work, paying the compression overhead each time.

4. The six layers stacked in every work

Independently of medium, every artwork lives at multiple layers simultaneously. The receiver decodes from the bottom up.

Layer What it encodes Example (a Vermeer interior)
0. Signal Physical artifact's bits pigment positions, surface texture
1. Perceptual primitives Channel-native atoms line · hue · luminance · depth cue
2. Form / grammar Local combination rules one-point perspective · light-source consistency · figure-ground
3. Style / idiolect Recognisable codec Dutch Golden Age domestic genre · Vermeer's specific light
4. Work This specific instance Woman Holding a Balance (1664)
5. Theme / referent What the work is about judgement · weighing · the still moment
6. Meta-claim What it asserts about art / reality the dignity of attention to ordinary things

Most works pack densely on layers 0–3 (the form) and point sparsely at 4–6 (the content). This asymmetry — most of the bits are formal, most of the meaning is at the top — is the central reason Shannon bits mislead anyone trying to measure art.

A useful diagnostic: when critics disagree, identify which layer they are arguing about. "It is not well-painted" (layer 1–3) and "it does not move me" (layer 5–6) are not the same claim.

5. Referential abstraction ladder

Orthogonal to medium and to layer-within-work, every artwork sits somewhere on a ladder of how abstract its referent is. The rungs (Peirce + Cassirer + standard art-history vocabulary):

Rung Definition Example Required priors
1. Iconic Resembles a specific thing portrait of a known person; landscape photo minimal — perception suffices
2. Typical Stands for a category generic Madonna; "a tree" in cave painting knowledge of the category
3. Symbolic Arbitrary sign for a meaning red rose · love; cross · Christianity the convention (cultural code)
4. Allegorical Narrative structure maps to abstract idea Pilgrim's Progress; Animal Farm the source ideas; reading-as-cipher
5. Archetypal Universal pattern of experience hero's journey; trickster figure comparative-cultural literacy
6. Abstract Structural relations themselves Mondrian; serial music; color field art-historical context for form-as-content
7. Conceptual The idea is the artwork Duchamp's Fountain; LeWitt's instruction sets the meta-frame of the art world
8. Meta / process Rules / system are the work Fluxus scores; generative art computational + art-historical priors

Two important features:

  • The ladder is not strictly progressive in time. Conceptual art does not "replace" iconic — they coexist. The 20th-century art-historical story of "ever-rising abstraction" is one trajectory through this space, not the space itself.
  • Each step trades anchoring for generality. An iconic portrait is high-fidelity but commits to one person. An archetypal image is low-fidelity but compresses millions of human stories.

6. What each medium uniquely encodes (codomain)

The defensible claim: each art form has a codomain — a slice of experience it natively encodes that the others can only gesture at from outside. Listed in approximate order of how much of human experience the codomain covers:

Medium Codomain (what it natively encodes) Failure modes when forced outside
Literature Interiority · counterfactual · linguistic self-reflection. The only form where she wondered whether… is native. Cannot deliver the unmediated sensory present; visual scenes are reconstructions in the reader.
Music Temporal affective topology — feeling-shapes evolving over time. Other forms describe emotions; music is an emotion's time-derivative. Cannot deliver specific propositional content. Programme music is a workaround, not native.
Cinema The simultaneity of face + time + place + sound. Uniquely produces empathic identification with strangers. Cannot deliver true interiority — only its outward signs (voice-over, expression).
Architecture Inhabited spatial cognition — you cannot be inside a painting. Bandwidth via the body, latency over years. Cannot deliver discrete narrative; cannot be "read" linearly.
Dance Kinesthetic mirror-knowledge — what bodies-doing feels like, transmitted via mirror neurons. Cannot deliver precise propositional content; ambiguity is structural.
Theatre Embodied human encounter in real time, irreproducible. Cannot deliver the impossible image (that is what cinema is for).
Painting / drawing The frozen privileged gaze — one moment selected and held. Cannot deliver time; the temporal dimension is the viewer's.
Photography Indexical trace — this configuration actually happened. Cannot deliver counterfactual or imagined content without manipulation; the indexical bond is the medium.
Sculpture 3D form · material · gravity-relation, with parallax as the temporal axis. Cannot deliver colour-as-light the way painting does (it has surface, not glow).
Conceptual art Meta-level art-statements at near-zero artifact cost. Cannot survive outside the art-world frame — the codec depends on the receiver knowing that this is art.
Ritual / performance art The transformation of the participant in real time. Codomain is the participant's state, not an artifact. Leaves no transmissible record without becoming something else.

These claims are deliberately strong and therefore contestable. The weaker, safer version: every medium can gesture at every codomain, but it pays compression overhead doing so, and the more native medium will beat it on density.

7. The interestingness frontier

Birkhoff (1933) proposed M = O / C — aesthetic measure as order over complexity. Berlyne (1971) studied the inverted-U: too simple is boring, too complex is noise. Schmidhuber (1991–2009) sharpened both into the compressibility-progress hypothesis: an artifact is interesting exactly when consuming it improves the receiver's internal compressor — that is, when the artifact sits at the edge of compressibility for the current receiver.

This explains a great deal:

  • Maximally compressible work (minimalism, Rothko, drone, haiku) — low Kolmogorov, high semantic resonance per bit. The receiver's compressor handles it almost trivially; what is transmitted is the fact that so little can carry so much.
  • Maximally incompressible work (Pollock, free jazz, glitch, noise) — high Kolmogorov, semantic content carried by the gesture of resisting compression. The receiver learns that compression itself was the prior assumption.
  • Edge-of-compressibility work — most canon (Bach fugues, Vermeer interiors, late Beethoven, Borges, Tarkovsky). The receiver's model improves fastest here because the work is almost predictable, then breaks the prediction in a structured way.

The interestingness frontier is receiver-dependent. A trained listener finds Bach interesting where a naïve one finds it predictable; a naïve listener finds pop interesting where the trained one finds it predictable. The codec is the artifact + the receiver's prior model together, never one alone.

8. Audience priors as part of the codec

This is the deepest consequence of the frame. Information is not in the artifact — it is in the gap between receiver's prior and receiver's posterior after consumption. So the same artifact at different priors is literally a different codec.

Examples:

  • Renaissance allegorical painting to a classically educated viewer vs a modern museum-goer: the same canvas transmits ~10× more semantic bits to the educated viewer because the iconographic code is in their prior.
  • Duchamp's Fountain to a 1917 viewer (the meta-frame is a shock, enormous information transfer) vs a 2026 viewer (the meta-frame is established, the gesture is now a quotation — much smaller transfer per encounter, but the gesture lives on as part of the language).
  • Benjamin's aura — mechanical reproduction collapses the one-of-a-kind property of the artifact, which is a sub-property of the codec, not metadata about it. The reproducible image is a different codec, with different transmission characteristics, even if the bits are identical.
  • Anthropological / cross-cultural reception — an artifact built for one community's priors transmits a different signal (often near-zero, or actively misleading) to another community. The art is not "universal" in the strong sense; the priors are local.

This is why training (art-historical literacy, musical ear training, literary reading practice) is a codec upgrade, not snobbery. More priors → wider channel → more bits delivered. No way around it.

9. Multimodal and the joint codec

Most real art forms are already multimodal (opera, film, theatre, dance with music, illustrated books). The joint codec is not the sum of its parts — it is a new codec with cross-modal redundancy and cross-modal disambiguation. See MIXTURES and MIXING-GENERALIZED.

Three effects worth naming:

  • Cross-modal redundancy — the score reinforces the image's affective valence; the actor's voice reinforces the line's meaning. This raises the channel's signal-to-noise but lowers density (the same information is carried twice).
  • Cross-modal disambiguation — the same line spoken with two different scores means two different things. Each modality disambiguates the others.
  • Cross-modal contradiction — the most powerful and rarest case. The image is calm; the music is dread; the gap between them is the signal. This is high-density and high-difficulty; it is also why cinema and opera, when good, hit harder than any single-channel form.

10. Operational consequences

For makers:

  • Identify the slice of experience you are trying to transmit, then pick the medium whose codomain natively contains it. Fighting the medium is paying compression overhead with no payoff.
  • The medium chooses what is hard. Music has to work against propositional precision; literature has to work against the unmediated present; cinema has to work against interiority. Lean into where the medium is easy unless you have a deliberate reason to fight.
  • Pick your generalization level on the iconic-to-conceptual ladder consciously. Each step is a different audience, a different prior set, a different lifespan.

For critics:

  • Separate the four axes of "information" and the six layers of any work before making a value claim. Disagreement is usually about which axis / layer matters, not about facts.
  • "Good" is medium-relative — the same standard cannot judge a song and a building. The honest comparison is density of insight per prepared receiver.

For AI systems generating or evaluating art:

  • Optimise semantic bits per prepared receiver, not pixel fidelity or FID-against-training-distribution. The latter rewards style mimicry, not insight transmission.
  • Receiver model matters: a system that does not represent the receiver's priors cannot estimate how much information its output transmits. Aesthetic evaluation without a receiver model is style-matching only.
  • Multimodal generation should exploit cross-modal disambiguation (and occasionally contradiction), not just stack channels.

For receivers:

  • Bring more priors → receive more bits. The artwork has not changed; your channel got wider. Training, repetition, and reading-around are codec upgrades, not chores.
  • The right test for "did this work transmit anything" is not "did I like it" but "did my model of something update." Liking is downstream of update.

11. Open questions

  • Is there a clean information-theoretic theory of embodied art (dance, ritual, architecture)? Shannon assumes a discrete channel; the embodied codecs are continuous, contingent, and have the receiver's body as part of the medium. Possibly the right frame is rate-distortion with proprioception, but no one has built it crisply.
  • Are maximally compressible art (Rothko, drone) and maximally incompressible art (Pollock, noise) secretly the same gesture pointed in opposite directions — both reject the middle and force the receiver to update their model of what compression is?
  • Can large multimodal models be used to measure semantic density empirically? Perplexity of a prepared-receiver model before and after exposure is at least operationalizable.
  • Where does AI-generated art sit on the abstraction ladder when the process is the work? It looks like rung 8 (meta / process) by default, but most of the output gets framed as rungs 1–3 — an unstable mismatch.
  • How much of art's transmission is propositional vs how much is state-induction (the work changes the receiver's body / mood directly, no proposition required)? The codec frame as written is biased toward propositional; the state-induction half deserves equal treatment.

References

  • Shannon, C. E., "A Mathematical Theory of Communication" (1948). Provides the information-theoretic bandwidth framework underpinning the codec framing.
  • Kolmogorov, A. N., "Three approaches to the quantitative definition of information" (1965). Grounds the page's claim about minimum description length as a measure of aesthetic information.
  • Peirce, C. S., Collected Papers (~1900). Source of the icon/index/symbol triad used to build the referent ladder.
  • Cassirer, E., Philosophy of Symbolic Forms (1923–1929). Underpins the claim that every art form is a distinct symbolic system, not a degraded form of language.
  • Benjamin, W., "The Work of Art in the Age of Mechanical Reproduction" (1936). Source for the aura concept and how mechanical reproduction affects the codec's uniqueness axis.
  • Birkhoff, G., Aesthetic Measure (1933). Quantitative aesthetics baseline; equation M = O/C relates order to complexity.
  • Berlyne, D. E., Aesthetics and Psychobiology (1971). Psychobiological grounding for why novelty-complexity tradeoffs affect receiver response.
  • Schmidhuber, J., "Formal Theory of Creativity and Fun" (1991–2009). Source for interestingness ≈ compression-progress; links aesthetic value to learning-rate improvement in the receiver.

See also