Sixty Metadata Fields: Craig Ball's 2026 ESI Refresh

Craig Ball's Annotated ESI Protocol refresh doubled the metadata addendum, gave Slack its own section, and added a two-tier framework for ephemeral data.

By Claude and Gemini with Sid Newby | May 2026

The 2023 version of Craig Ball's Annotated ESI Protocol fit in 33 pages. The 2026 refresh runs 50.^[1] Most of the new ink is about Slack. The rest is about every other place a custodian now talks, types, or pastes a link instead of attaching a file. Ball quietly posted the revision on May 1; Doug Austin wrote it up at eDiscovery Today on May 11.^[1]^[2] Outside a small circle of e-discovery practitioners, nobody seems to have noticed that the most-referenced ESI protocol template in the country just doubled its metadata addendum and added a tiered framework for forensic collection — which means thousands of ESI protocols are about to get negotiated against a document the negotiating attorneys have not read.

The Annotated ESI Protocol is not a court rule. It is a template, with annotations explaining each provision and citing the authorities behind it. Craig Ball published the original through his own blog in January 2023, where it has been the single most-downloaded ESI protocol reference in the United States for three years.^[3] EDRM mirrored it. Hundreds of judges have referenced it in chambers conferences. Law schools have built courses around it. When a litigation team sits down to draft an ESI protocol for a federal matter and starts with somebody else's language, they start with this one or with a derivative of this one. The 2023 version assumed email, loose files, and Office documents. The 2026 version assumes Slack, Teams, Google Chat, iMessage, Signal, Loop, cloud-hosted attachments, video, voicemail, foreign-language data, and generative AI in the review workflow. The gap between those assumptions is the gap between how custodians worked when the original was written and how they work now.

The relevant operational fact is that nobody's tooling produces all of this out of the box. The protocol acknowledges that, in language about proportionality and platform capability. Vendors will spend the next eighteen months catching up. Negotiating parties will spend the next eighteen months arguing about which of them gets to absorb the cost.

What Actually Changed

The headline numbers tell the structural story. The protocol's body grew by roughly 17 pages, mostly to accommodate eight new or substantially expanded sections.^[1] Metadata Addendum A — the schedule that defines the fields a producing party has to deliver alongside production media — expanded from 27 fields to approximately 60 fields, organized into 11 labeled subsections.^[2] That doubling is not cosmetic. The new fields cover things that did not have stable names in 2023: conversation identifiers for chat platforms, edit and deletion flags for messages that were retroactively modified, modern-attachment pointers that distinguish a hyperlinked file from a true attachment, and a "platform tier" tag that lets the receiving party see at a glance whether a given message came from a Tier 1 consumer-grade extraction or a Tier 2 forensic image.^[1]^[2]

The new substantive sections, in order of how much trouble each one resolves at a meet-and-confer:

Modern attachments. Cloud-hosted files referenced by URL, sent via Outlook's cloud-attachment default or Gmail's Drive-link UI. Five years of federal rulings produced a patchwork on whether the producing party has to capture the contemporaneous version of the file as-sent or the current version as-stored. The protocol's new exemplar language picks a default — contemporaneous, where technically feasible — and adds a LINKBEGBATES metadata field that ties each pointer to the produced version of the linked document, with the modification timestamp captured. This is the operational version of the doctrinal debate we covered in the hyperlinks post last week.^[4]

Collaboration platforms. Slack, Teams, Google Chat, and any equivalent thread-and-channel surface get a dedicated section requiring three things in parallel: a native export (typically JSON for Slack, an EDS PST for Teams, or the Google Vault export format), a human-readable rendered transcript that an attorney can review without specialized tooling, and separately produced attachments where the platform supports them.^[2] None of those three is sufficient by itself. The native export preserves metadata but is unreadable to most reviewers. The rendered transcript is readable but lossy. Attachments alone, without context, are useless. The producing party owes all three, with a proportionality qualifier that acknowledges some platforms make some of this difficult.

Mobile and ephemeral messaging. A two-tier framework. Tier 1, for standard civil cases, accepts consumer-grade backup-extraction tools — iMazing for iOS backups, similar utilities for Android — as defensible. Tier 2, for matters involving alleged spoliation, deleted-content recovery, or device-integrity disputes, requires forensic-grade collection (Cellebrite UFED, Magnet AXIOM) and explicit documentation of the extraction methodology.^[1] The line between the tiers is the moment somebody alleges spoliation. Until that moment, the protocol authorizes lower-cost collection.

Audio, video, and voicemail. Native-production language was added, with metadata fields for runtime, codec, source application, and transcription provenance. This section was a footnote in 2023; it is a section in 2026 because Zoom and Teams recordings are now systematically subject to discovery, and because AI-generated transcripts are themselves discoverable ESI that the producing party has to deliver with chain-of-custody documentation.

Search methodology. Expanded. The 2023 protocol gestured at search-term lists. The 2026 protocol adds language about hybrid retrieval, query expansion, and the validation set requirements that have to follow when the producing party uses retrieval beyond keyword matching. That section ties directly to the section on technology-assisted review, which was rewritten to address modern TAR — including the 2025 line of authority on whether a producing party even has to disclose its TAR methodology, a debate that the AI-generated ESI privilege ruling in Heppner muddled further.

Generative AI in review. A new section that establishes a disclosure and validation framework for review tools that use generative AI in any part of the workflow — coding suggestions, draft summaries, privilege classification, or end-to-end agentic review. The protocol does not ban any of it. It requires that the producing party disclose what was used, validate the output against a sample set, and produce the validation results. The implication: a producing party can use Cecilia, Relativity aiR, or any other GenAI review tool, but the burden of validation falls on the user, not the vendor. This is the protocol-level encoding of what Warner v. Gilbarco, Heppner, and Morgan v. V2X told us in Q1.

Foreign-language materials. A new exemplar provision addressing collection, translation methodology, and production format for multilingual data sets. The 2023 protocol acknowledged the topic existed and stopped. The 2026 version specifies that machine-translation output is acceptable as a starting point but requires human review for documents flagged as responsive, with explicit fields for translator identification and translation date.

Expanded Metadata Addendum A. Already discussed; the 60-field, 11-subsection structure is the main load-bearing change.

Figure 1: Structural comparison of the 2023 and 2026 Annotated ESI Protocol templates. New sections in the 2026 refresh (pink) cover communication and data types that either did not exist or were not in widespread custodian use when the original was written.

The Three-Legged Stool for Slack and Teams

The collaboration-platform section deserves a closer look because it is where the operational pain lives. The protocol's requirement that the producing party deliver native export plus rendered transcript plus separately-produced attachments is a direct response to the production patterns that have caused the most fights in 2024 and 2025.

Take Slack first. A Slack workspace export comes out as JSON files, one per channel or DM, with a separate file for users and a separate file for attachments.^[5] The JSON preserves message timestamps, thread relationships, edit history, reaction emoji, and the user IDs of authors and recipients. It is also, for any reviewer not running a Slack-specific tool, unreadable. A reviewer who opens the JSON in a text editor sees {"client_msg_id":"...","type":"message","subtype":"thread_broadcast","ts":"...","user":"U03ABCD12"} and gives up. So the protocol requires the producing party to also deliver a human-readable rendered transcript — the same conversation, formatted so an attorney can read it like a long email thread. And it requires attachments produced separately, with metadata tying each attachment back to the message it was posted to.

Teams is worse. Microsoft Teams stores chat data in a hidden Exchange folder per user (SubstrateHolds and similar), with a ThreadID header on each message that lets Relativity, Reveal, and the rest reconstruct conversations.^[6] But Teams chat is not Slack JSON. Native export typically means a PST containing message records, with attachments riding along, and the rendered-transcript step requires conversion through a short-message converter that vendors implement in materially different ways.^[7] The reactions, the inline link unfurls, the @-mentions, the "this message was edited" flag — all of those are metadata in the source that may or may not survive the export, depending on the tooling. The protocol says: produce all three. The protocol does not say: the vendor your client uses can do this without manual workarounds.

Google Chat — the smallest of the three platforms by market share but the one with the cleanest export API — comes through Google Vault as a series of mbox files or as text. The protocol applies the same three-legged structure to it. The proportionality qualifier matters most for Google Chat, because Workspace customers without a Vault subscription have meaningfully degraded export capability.

The honest operational consequence is that the producing party's tooling, vendor, and budget determine whether all three legs of the stool can actually stand. The protocol's response is a meet-and-confer trigger: if the producing party cannot deliver all three, the parties have to talk about it, document the gap, and either negotiate a substitute or escalate to the court. That trigger is the part of the new section that practitioners will use most. It moves the inability to produce a clean transcript from "the producing party's problem to hide" to "the meet-and-confer agenda's first bullet."

Platform	Native Format	Rendered Transcript	Attachments	Proportionality Concern
Slack	JSON per channel/DM	Required, vendor-rendered	Separate, linked by message ID	Channel/DM scope; private channel access
Microsoft Teams	PST or EDS export	Required, conversion through ThreadID grouping	Separate, with cloud-attachment handling	Hidden Substrate folders; Copilot Pages
Google Chat	Vault mbox/JSON	Required	Separate	Workspace tier (Vault dependency)
Loop / Coda / Notion	Vendor-specific	Required where feasible	Embedded vs. linked distinction	Block-level vs. document-level export

Table 1: How the protocol's three-legged stool maps to each major collaboration platform. The proportionality concerns column is where the meet-and-confer fights will happen. Source: Annotated ESI Protocol 2026 and platform documentation.^[1]^[5]^[6]^[7]

The Two-Tier Framework Is the Diplomatic Achievement

The mobile-and-ephemeral-messaging section is the part of the refresh that will save the most billable hours over the next two years. It also encodes a compromise that most ESI protocol drafters have been making informally for a decade without anyone admitting it.

Pre-2026, the binary choice in mobile collection was: forensic image of the device (expensive, slow, intrusive) or backup extraction (cheap, fast, contested). Forensic image meant Cellebrite or AXIOM, an examiner's chain of custody, and an invoice in the low five figures for a single phone. Backup extraction meant iMazing or a similar utility, a vendor's process documentation, and an invoice in the low three figures. The defensibility gap between the two was real in cases involving deleted-content allegations and almost completely irrelevant in cases without them. So practitioners spent decades doing backup extractions for routine matters, then quietly escalating to forensic collection when somebody got accused of deleting Signal messages.

The 2026 protocol writes that practice down. Tier 1, for standard civil cases, treats consumer-grade backup-extraction tools as defensible. The protocol authorizes it explicitly: a party that does Tier 1 collection in a matter where no spoliation has been alleged has produced under the protocol. Tier 2, triggered by an allegation of spoliation, a request for deleted-content recovery, or a device-integrity dispute, requires forensic-grade collection with full chain-of-custody documentation.^[1] Cellebrite UFED and Magnet AXIOM are named as exemplars. The shift from Tier 1 to Tier 2 is a meet-and-confer event with its own paperwork.

Figure 2: The Tier 1 / Tier 2 escalation logic from the 2026 protocol's mobile and ephemeral messaging section. Tier 1 is the default for routine civil matters; any of three triggers moves the matter to Tier 2 forensic collection with full chain-of-custody documentation.

The achievement is not the framework itself. The achievement is the cost discipline the framework enables. A plaintiff's firm that cannot afford a $15,000 forensic phone image for every custodian in a five-custodian matter can now point to the protocol and say: under Tier 1, here is what we are obligated to produce, here is what it costs, and if you want Tier 2 you need to make a specific spoliation allegation that survives the meet-and-confer. That is a meaningful redistribution of negotiating power. It is also a redistribution that the producing party gets to use the same way: the defendant in a routine employment matter does not have to image every executive's phone because plaintiff's counsel asked nicely.

The DOJ and FTC's January 2024 joint guidance on ephemeral messaging preservation remains the regulatory backdrop.^[8] The 2026 protocol is consistent with the guidance — it requires preservation, documents the collection method, and accommodates the possibility that an ephemeral platform's design will defeat collection. The guidance was process; the protocol is operational language to put in a court order.

What the Sixty-Field Addendum Actually Demands

Metadata Addendum A's expansion from 27 to 60 fields is the part of the refresh that will hit vendor product roadmaps hardest. The new fields are not generic. They are platform-specific and workflow-specific, and many of them require the producing party's collection vendor to do work that most collection vendors do not currently do by default.

The 11 subsections of the new addendum group fields by source type. There is a subsection for email metadata that mostly mirrors the 2023 fields. There is a subsection for loose-file metadata that adds a few new fields. The novelty is in the eight subsections that didn't exist before:

Collection-tool disclosure. Fields identifying the tool, version, and operator that produced the export. Auditable. Tied to defensibility.
Modern attachment metadata. LINKBEGBATES, LINKEDFILEPATH, LINKEDVERSIONTIMESTAMP, LINKEDFILEHASH. The fields that make a hyperlinked attachment review-able and produce-able as a unit.
Conversation identifiers. Thread ID, channel ID, workspace ID, parent message ID. The fields that let a reviewer reconstruct a conversation from a flattened production.
Edit and deletion flags. Whether a message was edited, when, and what the original said where the platform preserves it.
Platform tier. Whether the source was a Tier 1 or Tier 2 collection. This field flows from the mobile/ephemeral section but applies wherever the tier matters.
Reaction/inline metadata. Emoji reactions, @-mentions, link unfurl content. The metadata that gives a chat message its full meaning and that most current exports drop on the floor.
Audio/video provenance. Codec, runtime, transcription source, transcription model version. The fields that let a reviewer evaluate whether an AI transcript is reliable.
Translation provenance. Translator identity, translation date, machine-translation model where applicable.

Most current collection workflows produce a fraction of these fields. The vendors that produce the most — Microsoft Purview, Google Vault for Workspace, Slack Enterprise Grid's compliance export, Cellebrite UFED for mobile — produce different subsets, with different naming conventions. The protocol's contribution is a standardized field map. A producing party that adopts the protocol can hand the receiving party a clean schema. A receiving party can demand the schema. A judge can enforce the schema.

The vendors who will scramble are the smaller forensic providers and the mid-market eDiscovery platforms that have not invested in collaboration-platform native handling. The vendors who will benefit are the platform-specific specialists (Hanzo, Onna, the cloud-discovery wing of Lighthouse) and the integrated platforms (RelativityOne, Everlaw, DISCO) that already do most of this. Vendors caught in the middle — generalist forensic firms that have been selling Cellebrite outputs unchanged for a decade — have a quarter or two to catch up before the protocol's field map becomes the courtroom default.

Meet and Confer Just Got Harder

The protocol's structural choice — to encode capability gaps as meet-and-confer triggers — moves a lot of work to the front of the case. That is consistent with how FRCP Rule 26(f) was designed to operate, and inconsistent with how most ESI conferences actually run in federal court.^[9]

Recent decisions have been making the cost of an inadequate ESI conference more visible. In 777 Partners, LLC v. Leadenhall Capital Partners LLP (S.D. Fla. March 28, 2025), the court denied a motion to compel and closed discovery after the moving party refused, during a February 10 meet-and-confer, to share search terms or discuss data depositories — and then delayed producing terms until early March.^[10] The court's order treated the meet-and-confer failure as a discovery-management problem the moving party had to live with, not as a procedural foot-fault the court would forgive. Sheikh v. Spinnaker Insurance Co. (December 23, 2025) went further: the court denied a motion for production without redactions and issued an order to show cause why the moving counsel should not be sanctioned for filing without meeting and conferring.^[11] These are not outlier rulings. EDRM's commentary on ESI protocol practice in 2025 documented a steady drift toward courts treating ESI conferences as substantive obligations with teeth, not boxes to check.^[12]^[13]

Capability-gap mechanism in the Rule 26(f) timeline

Figure 3: How the 2026 protocol's capability-gap mechanism is supposed to operate in the Rule 26(f) timeline. The producing party identifies what its tooling cannot deliver, the parties negotiate or escalate, and the court rules on a documented record — not on the back-channel claim that "our vendor doesn't do that."

The drafting consequence for the lawyers who will spend the rest of 2026 negotiating ESI protocols is that the 2023 starting template no longer covers their case. A plaintiff's firm that pulls last year's protocol off the shelf and files it without updating the collaboration-platforms section, the metadata addendum, and the tier framework is filing a protocol that does not match what their client's custodians actually do. A defense firm that responds to that protocol with boilerplate objections is, increasingly, responding to a moving target the court has more sympathy for than the firm expected.

The honest read is that the protocol's framework rewards parties that come to the meet-and-confer with a clear picture of their own data and punishes parties that come with vague gestures at "we'll figure it out." That is not a new dynamic. It is a dynamic that the new template makes more legible.

What the Vendors Should Do Before Q3

The standardized field map and the three-legged-stool requirement for collaboration platforms will become the de facto specification that buyers test vendors against. The vendors that are ready, plus a credible compliance story, will get the renewals. The vendors that aren't will lose business slowly, in a way that does not show up in any single quarter, until somebody at a buyer's procurement meeting pulls up the protocol's metadata addendum and asks the vendor to produce a sample export against it.

Specific actions the vendors should take in the next quarter:

Map every metadata field in the 2026 addendum to an internal field in your collection and processing pipeline. If a field is not produced by default, document why and what it would take to produce it. The gap analysis is the salesperson's most important document for the next renewal cycle.

Implement a Tier 1 / Tier 2 indicator on every mobile or ephemeral collection job. The receiving party will ask. The producing party will need to know. The vendor that does not tag this in the export creates a defensibility hole in every matter.

Build a collaboration-platform export that delivers all three legs of the stool by default. Native, rendered transcript, separated attachments. Not as add-ons. Not as professional-services engagements. As the default behavior.

Publish a generative-AI-review validation harness. The protocol's GenAI section requires disclosure and validation. The vendor that ships a turnkey validation pipeline will sell more software than the vendor that tells customers to figure it out.

Update every ESI protocol template the firm publishes. The firms that distribute templates — Norton Rose, K&L Gates, Kilpatrick, and a long list of mid-market firms — are about to find out which of their templates have been left to rot since 2023.

The opportunity for the vendors who execute on this is real. Most of the operational pain in eDiscovery in the last three years has been on the collection side, and the 2026 protocol's contribution is to make the collection-side requirements legible enough that buyers can evaluate them. That makes the collection vendors who deliver the requirements look better, and the ones who don't look worse. We covered the collection-market inversion last week — the data was already pointing this direction, and the protocol just gave the market a vocabulary to express it.

What This Means for Cases That Cannot Afford Cellebrite

The under-discussed contribution of the two-tier framework is that it gives a plaintiff's firm — or any party with a smaller budget than its opponent — a defensible way to do collection within a realistic cost envelope. A solo practitioner handling an employment case in the Eastern District of Texas cannot price a $15,000 forensic image into the matter without making the case uneconomic. Under the 2026 protocol, the practitioner does not have to. Tier 1 is the default. The other side can argue for Tier 2, but the argument has to be grounded in a specific allegation, and the producing party can document its compliance with Tier 1 against a published standard.

The same logic applies in the other direction. A regional law firm that runs Microsoft 365 and uses Teams as its primary internal chat surface can now point to the protocol's collaboration-platform section and say: we will produce a native export, a rendered transcript, and separated attachments using Purview, with the specific field map in Addendum A. The receiving party knows what to expect. The producing party knows what it owes. The court has a clean document to enforce against.

That is not an abstract benefit. The gap between what big-budget defendants can produce and what small-budget plaintiffs can demand has been quietly widening for a decade. The protocol's standardized field map and tier framework are the rare regulatory document that closes a piece of that gap by giving the smaller party a published reference to negotiate against. The protocol is not court rule, and it does not bind any judge. But it is the most widely cited ESI protocol template in federal practice, and the judges who care about discovery management have shown a steady willingness to enforce the template's framework where the parties have adopted it.

The remaining work is enforcement. The vendors who actually produce against the 2026 specification will win renewals. The vendors who don't will fade slowly. The lawyers who actually read the 50 pages will negotiate better protocols. The lawyers who don't will keep filing 2023 boilerplate against opponents who have updated. Three years from now, when the 2029 refresh comes out, the practitioners who built operational fluency in the 2026 framework will be the ones who shaped what comes next.

Until then, every firm that drafts an ESI protocol this year should download the 50-page document, read it twice, and rewrite their template. The current template is wrong. The custodians have moved. The protocol followed them. The drafters get one quarter to catch up before opposing counsel notices.