Model Artifacts Are ESI: DOJ's New HSR Discovery Map

DOJ Antitrust now uses AI in investigations. The RealPage decree treats model artifacts and runtime logs as discoverable. HSR preservation just expanded.

By Claude and Gemini with Sid Newby | May 2026

The court-appointed monitor in United States v. RealPage can demand the company's source code, its model training documentation, and its runtime logic — and hire outside experts at RealPage's expense to interpret what the monitor finds.^[1]^[2] Read that sentence carefully. A federal officer empowered by the consent decree filed November 24, 2025 now has subpoena-equivalent reach into three categories of artifacts. The training data a pricing algorithm was built on. The configuration files that govern its production behavior. The version history showing how the model has changed. Most HSR preservation protocols I have seen in production deployments would not have flagged any of those as documents to put on hold. The producing party's litigation hold reaches custodians, mailboxes, and shared drives. The DOJ's view of what counts as relevant evidence reaches deeper than that, and the RealPage decree is the first public document where the depth gets written down.

Then on May 13, 2026, an MLex reporter cited an Antitrust Division official saying the DOJ now uses AI to detect and investigate anticompetitive conduct.^[3]^[4] The official did not name the tools. The Department has not released the underlying remarks. We know the practice is happening; we cannot see what tools are running against the documents producing parties hand over. The Antitrust Division has joined the SEC's Division of Enforcement and the FTC's Bureau of Competition. Call it the small, undocumented club of US enforcers that won't tell you which models they run against the materials you produce.

That asymmetry is the structural change. Everything else in this post follows from it.

What May 13 Actually Disclosed

The MLex piece is the load-bearing source for the public claim. ComplexDiscovery covered it on May 15 and observed, accurately, that the Division has not released the speech transcript or the slides that accompanied the remarks.^[4] The official quote — that AI is being deployed "to detect and investigate anticompetitive conduct" — leaves every operational question open. Which tools. Which workflows. Which case types. What inputs. What outputs. Whether the AI's classifications get reviewed by an attorney before they shape a subpoena, a civil investigative demand, or a Second Request specification list. Whether a producing party gets to know that an AI-assisted analysis influenced the scope of the request they are responding to.

The Division's silence on these points is consistent with how the SEC has handled its own AI deployment. The SEC's Enforcement Division has been running pattern-detection on EDGAR filings and trading data for years. The most a regulated entity ever learns about that work is the inference that something tripped a flag. The learning happens after the inquiry letter arrives. Producing parties operate downstream of analytics they cannot see and cannot validate. The Antitrust Division is now in that posture.

Why this matters operationally: in a Second Request response, the producing party makes proportionality arguments about scope. Those arguments rely on a shared understanding of what makes a document relevant. When the requesting agency is running its own AI scoring against productions in real time — sorting, prioritizing, pattern-matching across custodians — the producing party's TAR validation is no longer a private exercise. It becomes a defensive document. If the DOJ's AI surfaces a custodian or a data type the producing party deprioritized, the producing party has to explain, in the language of recall and precision, why the deprioritization was defensible. That is not how Second Request meet-and-confers have historically worked.

Sequence diagram of an HSR Second Request workflow with the DOJ's AI stack inserted between the filing and the production review

Figure 1: The Second Request workflow with the DOJ's AI stack inserted. The producing party never sees the AI nodes directly but has to defend production methodology against their output.

RealPage Is the Discovery Roadmap

The RealPage consent decree, filed in the Middle District of North Carolina on November 24, 2025, is the most consequential public document the Antitrust Division has produced on what algorithmic-pricing evidence looks like.^[1]^[5] The substantive antitrust restrictions get most of the press coverage. RealPage has to retrain pricing models on data aged twelve months or older. End real-time use of nonpublic competitor data. Stop geographic modeling below the state level.^[1] Those terms matter for the company's product roadmap. What matters for everybody else with a pricing algorithm — or, for that matter, any algorithm that processes competitively sensitive data — is the schedule of records the court-appointed monitor can demand.

Reed Smith's analysis of the decree is direct on this point. The monitor has "broad access to review RealPage's code, model training documentation, and runtime logic."^[2] Paul Weiss adds that the settlement identifies three risk surfaces for any algorithmic-pricing tool.^[6] Runtime operations using nonpublic competitor data. Model training on insufficiently aged data. Automation features that lock users into default pricing recommendations. Each of those surfaces generates discoverable artifacts. Runtime operations leave logs. Model training leaves training datasets, version artifacts, and engineering tickets. Default-automation features leave configuration files and user-interaction telemetry. The DOJ asked for and received a settlement architecture that treats all of those as evidence.

Now translate that into HSR preservation language. The traditional Second Request preservation map covers email, loose files, and structured business records. The DOJ and FTC updated their preservation specifications in January 2024 to add chat platforms like Microsoft Teams, Slack, Google Chat, and ephemeral apps used for business purposes.^[7] What RealPage now adds, by implication, is a fifth category: the algorithmic systems themselves. The training data they were built on. The model artifacts produced during training. The version history of those artifacts. The configuration files that govern runtime behavior. The output logs that record what recommendations the system produced and when. The engineering tickets that capture every parameter change the modeling team has made.

A producing party in an antitrust investigation that involves any algorithmic system now has to assume that all of those artifacts are within the agency's view of relevance. Algorithmic system means pricing, demand forecasting, recommendation — anything that processes sensitive market data. Not because the agency has explicitly added them to the standard Second Request specification list, but because the agency has just demonstrated, in a published consent decree, that it considers them relevant.

ESI Category	Traditional Second Request Scope	Post-RealPage Implied Scope
Email	Custodian mailboxes, deleted-item recovery	Same, plus custodians on the algorithm team
Loose files	Shared drives, OneDrive, SharePoint	Same
Structured data	ERP, CRM, financial reporting	Same, plus pricing-engine output databases
Chat	Teams, Slack, Google Chat, Signal (business use)	Same, plus channels used by pricing/algorithm teams
Algorithmic systems	—	Training data, model artifacts, configuration files, runtime logs, engineering tickets, version history

Table 1: HSR Second Request preservation scope expansion as implied by the RealPage consent decree. The bottom row did not appear in standard preservation specifications before November 2025.

The Regulatory Whiplash Underneath All This

While the DOJ has been quietly expanding what counts as relevant evidence, the formal HSR rulebook has been in chaos. The 2024 HSR Final Rule was the largest expansion of premerger reporting requirements in decades. It took effect February 10, 2025. Filers now had to produce narrative descriptions of competitive overlaps, organizational charts, transaction rationales, and Item 4(c)/(d) documents covering a broader date range than the old form contemplated.^[8] On February 12, 2026, the U.S. District Court for the Eastern District of Texas vacated the rule.^[9] The court held that the FTC had exceeded its statutory authority under the HSR Act's "necessary and appropriate" standard and that the rule was arbitrary and capricious under the Administrative Procedure Act. The FTC appealed to the Fifth Circuit, which granted a temporary administrative stay while the underlying motion got briefed. The form stayed in effect during the stay period.

Then, on March 25, 2026, the FTC and DOJ jointly issued a Request for Information seeking public comment on potential improvements to the HSR Premerger Notification and Report Form, with comments due May 26.^[10]^[11] Buried in the RFI's questions is one that, in any other regulatory environment, would have generated significant press coverage. The agencies are asking about "the costs and benefits associated with the use of artificial intelligence or generative AI tools in preparing HSR filings and responding to Second Requests."^[11] That question lands differently in light of the MLex disclosure six weeks later. The agencies are asking filers and outside counsel what AI tools they use to respond to Second Requests, while not disclosing what AI tools the agencies themselves use to evaluate the responses.

The combined effect on a corporate legal department trying to make a budget call about Second Request readiness in May 2026 is hard to overstate. The substantive HSR form requirements are in litigation. The agencies are running an RFI that may produce a new rule. The DOJ has confirmed it uses AI in antitrust enforcement but will not specify what. The most recent published consent decree treats algorithmic systems as discoverable. And the EU AI Act's high-risk classification for legal AI systems takes effect August 2, 2026. Any cross-border deal involving European operations is also navigating a parallel regulatory regime with its own AI documentation requirements.

Timeline of HSR rulebook turbulence 2024 to 2026 overlaid with the algorithmic discovery story showing the gap between public and internal AI rulemaking

Figure 2: HSR rulebook turbulence overlaid with the algorithmic discovery story. The substantive rulebook is in litigation, the AI rulebook is being written publicly, and the AI rulebook the DOJ runs internally is not being written publicly at all.

The Three Operational Changes for Producing Parties

ComplexDiscovery's analysis of the MLex disclosure identifies three concrete changes a producing party should make to its Second Request response architecture.^[4] I want to push past the article on each of them, because each one carries cost and staffing implications that are worth being specific about.

Custodian identification has to reach the algorithm teams

A traditional Second Request custodian list reaches the executives who approved final pricing, the sales leaders who set discount authority, and the legal team that vetted competitive communications. The post-RealPage custodian list adds the data scientists who built the pricing model, the analysts who interpret its output, the product managers who specified its behavior, and the engineers who maintain its production deployment. In a midsize company, that may double the custodian count. In a large company with a pricing analytics team distributed across multiple business units, it may add forty or fifty custodians the legal hold notice has never reached.

This is a real number with a real cost. At industry-standard per-custodian collection rates of roughly $1,500 to $3,500 depending on data volume and platform mix, an additional forty custodians is a six-figure addition to the collection budget. The producing party can argue proportionality, but the argument is harder to make when the agency's own published settlement architecture treats the algorithm team's records as central evidence.

Preservation specifications have to name the data types explicitly

Standard preservation specifications cover email, files, structured business records, and chat. They do not, by default, cover model training datasets, model version repositories, model artifact stores, configuration management systems, or runtime telemetry. The producing party that issues a generic preservation hold runs two risks. The model artifact store may get routine maintenance that overwrites prior versions. The configuration management system's history may get pruned for storage reasons. Both events would, under the RealPage standard, be spoliation. The hold has to name the systems.

This implicates infrastructure most legal holds do not currently reach. The model artifact store may be in Hugging Face, MLflow, Weights & Biases, or a homegrown S3-backed system. The configuration management system may be Git, GitOps tooling, or a deployment platform like Spinnaker or ArgoCD. The runtime telemetry may be in Datadog, Splunk, or a custom logging pipeline. The legal hold custodian for each of those systems is the engineering manager whose team owns the platform, not the IT department. The hold has to be drafted in language those engineering managers will recognize and act on.

TAR validation becomes a defensive document

Second Request productions have used technology-assisted review for a decade. Validation reports have existed all along — recall, precision, F1 against a control set, sampling protocols. What changes when the agency runs its own AI against the production is that the validation report stops being a methodology disclosure and starts being a defensive exhibit. The agency can compare the producing party's recall claims against its own scoring of the production. Discrepancies will surface. The producing party that cannot explain why its recall calculation diverges from the agency's implicit scoring is exposed.

Practical implication: TAR validation has to be designed for adversarial inspection. The control set has to be drawn from a defensible random sample with documented selection criteria. The reviewer agreement metrics have to be published. The sampling for production validation has to be statistically robust at a confidence level that survives independent verification. 95% confidence at ±2% margin is the floor. Many practitioners are moving toward 99% confidence at ±1% margin for high-stakes matters. The validation report has to be writable to a forensic-grade audit log that captures every decision the senior reviewer made about coding inconsistencies.

This is more rigor than most producing parties have applied to TAR validation. Vendors who have built their AI review platforms around speed of throughput, with validation as a back-office function, will need to surface validation as a first-class product feature. Producing parties evaluating those vendors should ask, before signing the contract, whether the platform can generate a validation package that holds up against the kind of inspection a sophisticated requesting agency will now perform.

What Assefi Actually Said

Acting Assistant Attorney General Omeed Assefi delivered remarks at NYU's Engelberg Center on Innovation Law and Policy on May 7, 2026, six days before the MLex disclosure landed.^[12] The speech was titled "Shirley Temple Antitrust: A Fresh Take on an Old Classic." The headline in press coverage was a warning to dealmakers. Do not dress up ordinary merger rationales in AI disruption language. The Antitrust Division now expects "actual evidence" before treating AI-based competitive claims seriously.^[13]^[14]

The implication of "actual evidence" in this context is more demanding than a press summary suggests. Say a merging party claims AI will be transformative for the combined entity's competitive position. The Division will want to see internal documents that support the claim. The engineering roadmaps. The productivity studies. The procurement decisions. The training data the combined entity will have access to that neither party has alone. Those documents are now within the explicit scope of what the Division considers relevant to Section 7 analysis. The "actual evidence" standard is, in operational terms, a discovery obligation.

The Division also noted in the same speech that it approved 99.5% of HSR-filed mergers in fiscal 2025.^[13] That number cuts in two directions. It undermines the narrative that the Division is hostile to dealmaking. It also makes clear that the 0.5% that draws a Second Request faces an enforcement architecture that has been investing aggressively in analytical capability. The Division is not investigating more deals. It is investigating fewer deals more deeply.

For a producing party, the math is unambiguous. If you get a Second Request, you are in the 0.5% of filings the Division has affirmatively chosen to investigate, against an enforcement team that is now using AI to scope and prioritize the work. The traditional Second Request playbook — broad collection, TAR-assisted review, rolling production, privilege log negotiation — still applies. What changes is the production has to anticipate adversarial analysis from a team with computational capability the producing party cannot inspect.

The eDiscovery Vendor Question

The vendor stack that will support post-RealPage Second Request work is, today, only partially built. The traditional Second Request vendor list — Lighthouse, Consilio, HaystackID, Epiq, Big 4 — has deep expertise in email, files, and chat collection and review. Algorithmic-system collection is a different muscle. It requires reach into ML platforms, version control systems, and observability tooling that sits inside the engineering organization, not the IT organization. The collection team has to be able to talk to a data engineering lead about how the feature store is versioned and whether the production model registry preserves a complete history of deployed models. Those are not eDiscovery vendor conversations as currently configured.

Some specialty firms have started moving into this space. Forensic-services firms with deep ML experience — including some boutique forensics shops that grew up in the cryptocurrency-tracing world — are the closest analog to what an algorithmic-systems collection team would look like. The eDiscovery majors will need to acquire or build that capability. Current Second Request collection pricing already runs $25 to $60 per gigabyte processed under the AI-augmented review models that became standard in 2025. Collection itself is billed separately, typically flat per-custodian or per-source. Algorithmic-systems collection is about to add a premium category the market has not yet quoted.

A producing party budgeting for Second Request response in 2026 should plan for an algorithmic-systems collection line item that does not exist on most current vendor RFPs. The cost is genuinely difficult to estimate because there are not enough representative engagements yet to anchor it. A reasonable working estimate, based on similar specialty-collection categories in adjacent regulatory work, runs $75,000 to $250,000 for a single matter. The driver is the number of distinct ML platforms in scope and the depth of version history that has to be preserved. That estimate will refine as the market completes a few more reference engagements.

Pie chart of estimated cost distribution for a post-RealPage Second Request engagement highlighting the new algorithmic systems collection category

Figure 3: Estimated cost distribution for a post-RealPage Second Request engagement. The "Algorithmic systems collection" slice is the category most current vendor pricing models do not yet quote.

The Asymmetry That Matters

The smaller the producing party, the worse this gets. A Fortune 500 merger generates an HSR filing supported by a litigation budget that absorbs algorithmic-system collection as a line item. A midsize healthcare deal draws Second Request scrutiny under the current Division's emphasis on household-impact sectors. Those deals get done by regional firms with litigation support practices built for traditional collection categories. The midsize deal team gets the same Second Request, with the same implicit expectations about preservation scope. The cost asymmetry is real and structural. A regional firm cannot easily build internal capability for ML platform collection. The midsize buyer cannot easily absorb a quarter-million-dollar specialty-collection line item without renegotiating the deal economics.

This is the part of the story that doesn't make it into vendor white papers. The Antitrust Division's expanded view of what counts as evidence raises the floor for defensible Second Request response. The floor sits higher than many sophisticated producing parties currently operate at, and considerably higher than midmarket producing parties can plausibly reach without restructuring their litigation support arrangements. The companies that can absorb the cost will. The companies that cannot will produce something less than the agency's implicit standard and hope that what they produce satisfies the case team.

The Division has not said it will treat short productions as adverse inference grounds. The Division also has not said it will not. Producing parties operating below the post-RealPage standard are now exposed to an enforcement risk the case law has not yet defined, against an enforcement team running tools the producing parties cannot inspect. The smart move for any company facing a Second Request between now and the end of 2026 is to over-build the preservation scope. Document the production methodology with adversarial scrutiny in mind. Price the engagement for the new floor rather than the old one.

For PlatinumIDS clients with antitrust exposure, the practical operational message is direct. The preservation hold needs to name model artifact stores, configuration management systems, and runtime telemetry platforms by their actual technical names. The custodian list needs to include the data engineering leads who own those platforms. The TAR validation needs to assume the agency will run its own scoring. And the budget needs a line item for specialty collection that the standard vendor stack does not yet quote. The Division has changed the discovery map. The producing parties that read the change first will spend less money and take less risk than the ones that wait for a federal court to confirm what the May 13 disclosure has already made clear.