Vertical or Horizontal: The Open Question I Can't Answer

I have written, on this site, fairly strong opinions about specialized legal AI tools. I have argued that most of them are wrappers around general-purpose models, that the markups are hard to justify, and that lawyers should try Claude and Claude Code before paying $500 a month for a “legal AI assistant.”

I stand by those positions. But I want to write today about the part of the question I am still genuinely uncertain about—because intellectual honesty matters, and because I think the choice between general-purpose AI and vertical legal AI is more interesting than my prior writing has fully acknowledged.

The question, stated bluntly: for serious legal work, are general-purpose models like Claude better than vertical legal AI like Harvey or CoCounsel, or worse, or just different?

I have been testing both, on real matters, for over a year. I do not have a confident answer. I have a working hypothesis, which I’ll share. But I want to flag, before I share it, that I think the honest position right now is “it depends, and the dependencies are not yet well-understood.”

The case for vertical

Let me steelman the vertical legal AI position first, because I have been more critical of it in past writing and I want to be fair.

Vertical tools are designed around the workflow, not the model. When Harvey or CoCounsel produces output, the output is structured the way lawyers actually use legal output: in pleading format, in memo format, in clause-by-clause format. The general-purpose model has to be coaxed into producing similar structure with each prompt. The vertical tool has the structure baked in. For lawyers who do not want to invest time in prompt engineering, this matters.

Vertical tools have access to legal databases the general models don’t. Some vertical tools integrate with Westlaw, Lexis, or other authoritative legal databases. The output draws from primary sources rather than from whatever happened to be in the model’s training data. For citation-heavy work, this is a real and important advantage.

Vertical tools claim to have addressed hallucination through retrieval-augmented generation (RAG). Rather than generating citations from the model’s general knowledge, the tool retrieves real cases and then has the model write around them. In theory, this eliminates one of the most dangerous failure modes of general AI in legal work. In practice, the implementation varies enormously by vendor.

Vertical tools are easier to defend to firm IT and risk committees. A partner adopting Harvey can say “this is an approved legal-tech vendor with SOC 2 compliance.” A partner using Claude on their personal device is making a different and more individual argument. For lawyers operating within institutional risk frameworks, the vertical tool may be the only feasible option, regardless of relative quality.

If you weight these factors heavily, vertical AI looks attractive. There is a real argument to be made.

The case for horizontal

Now let me state the case for general-purpose models, which is closer to my prior writing.

General-purpose models are improving faster. Claude, GPT, and Gemini are updated frequently with major capability increases. The latest Claude is meaningfully better than the version from six months ago at legal reasoning. Vertical tools, by contrast, are dependent on whatever foundation model they’re built on. Their improvements lag the underlying models. The lawyer who uses Claude directly is always on the frontier; the lawyer who uses a vertical tool is one or two generations behind.

General-purpose models are more flexible. I have used Claude to do contract review, legal research, memo drafting, deposition prep, and to build small tools for my own practice (a clause library, a redline tracker, a client question generator). No vertical tool I have tested supports this range. They are designed for specific workflows. The general model is a general intellectual tool.

General-purpose models cost dramatically less. Claude Pro is $20 a month. Harvey, depending on the tier, is several hundred to over a thousand dollars per user per month. The differential cost compounds quickly across an associate’s career. If the general model is even within shouting distance of the vertical tool on quality, the cost difference dominates the calculation.

General-purpose models do not lock you into a vendor’s worldview. A vertical tool decides, on your behalf, what a legal memo should look like, what questions are worth asking, what workflows are supported. The general model is neutral. The lawyer’s own taste and judgment shape the output. For senior lawyers with strong professional habits, this matters.

If you weight these factors heavily, horizontal AI looks better. The argument is also real.

Where I am genuinely uncertain

After a year of using both, here is what I cannot answer with confidence:

For lawyers who haven’t invested in prompt engineering, does vertical AI close the gap? I am efficient with general models because I have spent considerable time learning how to prompt them. A junior lawyer without that investment might get more value from a vertical tool that has prompting built in. I don’t know how to weigh this. The answer depends on whether prompting skill is a one-time investment or an ongoing requirement.

Is the integration advantage of vertical tools sustainable? Westlaw and Lexis integration is currently a meaningful differentiator for vertical tools. But the general models are increasingly being given access to the open web and to standardized databases. Within two or three years, the integration advantage might disappear. Or it might deepen, as vertical vendors negotiate exclusive partnerships. I don’t know which way this goes.

Does the institutional adoption argument decide the question regardless of relative quality? If a firm has approved Harvey and not approved Claude, the partner can use Harvey today and cannot easily use Claude. The relative quality of the tools doesn’t matter; the institutional decision has already foreclosed the comparison. Most firms are in this position. The question of which tool is better may be moot for most lawyers, because they don’t actually get to choose.

Do the failure modes differ in important ways? A general model produces hallucinations. A vertical tool with RAG produces a different kind of error: real cases cited in misleading ways, retrieved documents misinterpreted, false confidence created by the appearance of grounding. I am not yet able to compare these failure modes on a like-for-like basis. They both happen. They both matter. They are different problems with different mitigations.

My current working hypothesis

For what it’s worth, here is where I am right now.

For lawyers like me—senior practitioners with established workflows, time to invest in prompting, and personal control over our tool choices—general-purpose models like Claude are clearly better. The flexibility, the rate of improvement, the cost, and the lack of vendor lock-in outweigh the workflow conveniences of vertical tools. I will continue to use Claude and Claude Code as my primary AI tools.

For lawyers at large firms with institutional risk constraints, vertical tools may be the only option, regardless of relative quality. The question of “which is better” is not the right question for them. The question is “which is approved.” If only Harvey is approved, then Harvey is the right tool, and the comparison to Claude is academic.

For lawyers in between—solo practitioners and small-firm lawyers with personal control over their stack, but without the time to develop deep prompting skills—I genuinely don’t know. The vertical tool might save enough learning time to justify the cost. Or the general model might be a better long-term investment. I would test both, on real matters, for a few months, and decide based on personal experience rather than on what I or anyone else writes about it.

What I would test if I had unlimited time

There are a few experiments I would run if I had the resources:

Side-by-side on the same matters. Take fifty real legal matters of varying types and have both a general model and a vertical tool produce output for each. Have a senior lawyer blind-grade the outputs. See where the differences actually lie.

Long-tail tasks. Vertical tools shine on the standard tasks they were designed for. Test them on unusual matters—cross-border issues, novel structures, edge cases. My suspicion is that vertical tools degrade more steeply on edge cases than general models, but I haven’t confirmed it.

Junior lawyer adoption curves. Have new associates use one tool for six months and the other for six months. Measure which produces better skill development. My suspicion is that general models are better for skill development because they require active prompting, but again, I haven’t confirmed.

I would love to see these experiments run. As far as I can tell, no one is running them, because the financial incentives are pointing in the other direction—vendors want to sell vertical tools, and general model companies don’t compete in the legal-vertical space directly.

A request to readers

I am writing this article in part because I want to hear from lawyers who have made the comparison themselves. If you have meaningful experience with both vertical and horizontal AI in legal practice, I’d like to know:

What were the failure modes you actually encountered?
Where did one clearly outperform the other?
Did your conclusion change over time as you got better at one or the other?

Email me at [email protected]. I will not publish anything without permission, but I want to build a clearer picture than I can build from my own experience alone.

This is, I think, the most consequential open question in legal AI right now, and the public discussion of it is dominated by people who have financial reasons to take a position. I would prefer to hear from people who are actually using these tools to do real work, and who don’t have a horse in the race.

The honest answer to “vertical or horizontal” is “I don’t know yet.” I am writing under a pen name partly so I can say that without being expected to take a side. The lawyers who claim certainty here are either selling something, or have not yet been through enough cases to have learned the limits of their own conclusions.

I’ll keep testing. I’ll write again when I know more.

This is the seventeenth article on Counsel and Code. Previous articles have argued more strongly for general-purpose tools; this article is an attempt at honest self-correction. Related: why most legal AI is just GPT with a markup and why arguing with AI beats trusting it.

If you’ve done your own comparison and want to share findings, email [email protected].