From novelty to nuance

The promise of artificial intelligence often sounds effortless: smarter tools, faster decisions, and leaner operations, all unlocked with a few lines of code. In practice, the reality has proved far messier. Across industries, companies are discovering that making AI genuinely useful requires far more human judgment, patience, and customization than early marketing suggested.

A small but telling example emerged last spring at CellarTracker, a popular wine-collection app. The company built an AI-powered sommelier designed to give blunt, personalized wine recommendations based on a user’s tastes. Instead, the chatbot was relentlessly polite.

“It was always trying to be agreeable,” said CellarTracker CEO Eric LeVine. “Rather than saying, ‘You’re really not going to like this wine,’ it would hedge.” Engineers spent six weeks tweaking prompts and guardrails before the tool would offer candid opinions. Only then did it feel ready for users.

Big expectations, modest returns

CellarTracker’s experience reflects a broader pattern. Since ChatGPT burst into public view three years ago, companies of all sizes have rushed to embed generative AI into products and workflows. Yet meaningful financial returns have been elusive.

Surveys of executives paint a sobering picture. A Forrester Research study of more than 1,500 executives found that just 15% saw profit margins improve due to AI over the past year. Consulting firm BCG reported even lower numbers, with only 5% of executives saying AI had delivered widespread value across their organizations.

The enthusiasm has not disappeared, but timelines are shifting. Forrester now predicts that by 2026, companies will delay roughly a quarter of their planned AI spending by at least a year.

“The story from tech vendors is that everything changes overnight,” said Forrester analyst Brian Hopkins. “But organizations—and people—don’t actually change that fast.”

The “easy button” that wasn’t

Early excitement around generative AI often framed it as an “easy button” for productivity. Companies set up internal task forces, rolled out pilots, and assumed large language models could quickly absorb and interpret vast amounts of information.

Reality intruded. One persistent issue is what researchers call “sycophancy”—the tendency of models to please users rather than challenge them. That bias can encourage engagement but undermines usefulness in areas that demand judgment or criticism.

In other cases, consistency proved elusive. At Canadian rail services provider Cando Rail and Terminals, managers tested an internal chatbot designed to summarize safety rules and training materials. The system repeatedly stumbled over the Canadian Rail Operating Rules, a core 100-page document for the industry. Sometimes it forgot key rules; other times it invented them.

After spending roughly $300,000, the company paused the project.

“We all thought it would be simple,” said Jeremy Nielsen, Cando’s general manager. “That’s just not what happened.”

When humans step back in

Few areas illustrate AI’s limits more clearly than customer service. Once widely expected to be automated away, human agents are making a comeback.

Swedish payments firm Klarna rolled out an AI-powered support agent in early 2024, initially touting it as equivalent to hundreds of human workers. By 2025, the company acknowledged that many customers still wanted to speak with a person—especially when problems became complex.

The revised strategy blends AI with human oversight. Simple issues are handled by chatbots, while nuanced cases are escalated to people. U.S. telecom giant Verizon has taken a similar approach, leaning back into human-staffed support after customers expressed frustration at being unable to reach a live agent.

“Empathy is the biggest missing piece,” said Ivan Berg, who oversees AI-driven service operations at Verizon. “That’s what’s holding us back from letting AI handle everything.”

The jagged frontier of capability

Researchers describe AI’s uneven performance as a “jagged frontier.” Models can excel at advanced math or coding, yet fail at surprisingly basic tasks like understanding dates, locations, or context.

“It can be a Ferrari at math and a donkey at scheduling,” said Anastasios Angelopoulos, CEO of benchmarking firm LMArena.

For businesses, these gaps often surface in data-heavy environments. Financial firms, for example, pull information from dozens of sources formatted in incompatible ways. Without extensive data cleaning, AI systems can infer patterns that simply aren’t there.

At Dutch investment group Prosus, an internal AI agent is meant to answer portfolio questions that analysts typically handle. In theory, it could answer how often a delivery service was late in a specific city last week. In practice, it sometimes struggles to define “last week” or even the boundaries of a city.

“People thought AI was magic,” said Prosus AI head Euro Beinat. “It’s not.”

More guidance, less illusion

Recognizing these challenges, AI developers are shifting tactics. OpenAI has created teams that work directly with enterprise clients, helping them identify smaller, high-impact use cases rather than sweeping transformations. Rival lab Anthropic is hiring “applied AI” specialists to embed with customer organizations.

The message is increasingly clear: AI works best when paired with deep understanding of specific workflows. That insight is fueling a wave of startups building narrowly focused tools for industries like finance, law, and marketing.

San Francisco–based Writer, for example, assigns engineers to collaborate directly with client teams, co-designing AI agents around real processes rather than generic use cases.

“Companies need a lot more handholding than they expected,” said Writer CEO May Habib.

A slower, more grounded future

The unprecedented investment flowing into chips, data centers, and energy infrastructure suggests confidence in AI’s long-term importance remains strong. But whether those bets pay off will depend less on flashy demos and more on careful integration.

For many companies, the next phase of AI adoption looks less like disruption and more like discipline: smaller projects, tighter feedback loops, and a renewed appreciation for what humans still do best.