A Brilliant Intern With No Skin in the Game

What happened when I rebuilt my Stanford course for the agentic AI era — and let the students grade the experiment

Jun 09, 2026

This spring I treated my own course as a field experiment.

MS&E 272, Entrepreneurship Without Borders, has always asked students to do the unglamorous core work of starting a company: find a real problem, talk to real customers, narrow to something buildable, and prove that someone, somewhere, will commit. What changed this year is that every student walked in with a tireless research analyst, market-sizer, interview-protocol writer, and prototype-builder already in their pocket.

That forced a question I couldn’t teach around: when the first draft of everything is suddenly free, what is actually left for a founder to do?

So I rebuilt the course to answer it.

The redesign: require AI, then require breaking it

The lazy options were to ban AI or to wave it through. I did neither. Every team had to use agentic tools to generate personas, size markets, draft interview protocols, build prototypes — and then every assignment carried a second requirement: take that output into the world and try to falsify it. Four design choices made that rhythm real, because a principle that isn’t in the rubric is just a vibe.

I graded the prompt log, not just the output. Once AI makes the polished artifact nearly free, the artifact stops being good evidence of thinking. So the prompt became the thing I assessed, the sequence of questions a team asked, where they pushed back on the model, where they caught it. The reasoning trail is now the deliverable.

Verification became a professional habit, not a disclaimer. Every AI-generated fact had to be checked against a primary source. I was most aggressive about this on geography: these models carry a heavy US/Western prior, which is poison for a cross-border course. Part of the graded work was explicitly catching and correcting that prior, the moment the model assumes Plaid-style banking APIs exist, or that a “no” means the same thing in Jakarta as in Palo Alto.

The rubric used an explicit evidence hierarchy, and it protected customer discovery from automation. This was the change I’d defend hardest. The temptation in the agentic era is to let students “interview” synthetic personas and have AI summarize what customers probably think. So I ranked evidence: AI-synthesized interviews sit at the bottom; real behavioral commitments: money, time, social capital, a signed LOI became are the gold standard. Skin in the game, not expressed interest. The rubric rewards evidence depth far more than polish, precisely because AI makes polish free. (I worked out the underlying argument, why customer evidence has quietly become cheap talk, and the shift from AI as generator to AI as verifier, in an earlier piece here.)

More weight on problem-finding than problem-solving. AI is excellent at solving a crisply stated problem and terrible at telling you you’re solving the wrong one. So the framing, can you state what you’re actually solving, in one sentence? now carried more of the grade than the solution.

The pattern was baked in from the very first assignment. Before a team had even locked its problem, it had to put a generative model to work as a thinking partner — map the five most underserved sub-problems in its space outside the US, then make it play a skeptical investor who had seen 200 pitches and name the three reasons the idea would fail. But the graded core wasn’t the model’s output. It was an individual reflection in which each student had to point to exactly where the AI hallucinated, oversimplified, or tipped its hand toward Western, US-centric markets, plus an interview target list of at least five real, named people the team would actually contact.

The centerpiece assignment made the order explicit. Customer discovery was structured as a five-step workflow that deliberately sandwiched human contact between two layers of AI. Teams built personas and an interview guide with AI; then practiced against AI-simulated personas to find the weak questions before spending a real person’s time; then ran five to ten interviews with actual humans, the step the assignment flatly calls the irreplaceable core; then fed the transcripts back to AI for pattern-finding while forcing it to steelman the opposite conclusion (make the case that this problem is not worth solving) and to name what it might be misreading; and only then sized the market, with every figure traceable to a named primary source. AI prepared the conversation and synthesized it afterward. It was never allowed to be the conversation.

By the unit-economics assignment later on, the verification rule had hard teeth: teams modeled LTV and CAC with AI-surfaced benchmarks, but any number that traced back only to the model, with no primary source behind it, earned no credit. Across all of these, a one-page AI prompt log was a standard deliverable, and the rubric paid points for catching the AI’s mistakes, not for hiding them. The polish was free; the credit was for the verification.

One more mechanic tied it together. In the final presentations I added a required slide that turned out to be the most interesting thing all quarter: show us where AI got it wrong, once you’d done your real interviews. That single prompt is where the best lines in this essay came from.

The design principle, which one team stated more cleanly than my syllabus did: treat every output as something to verify, not accept.

The room was the counterweight

Reworking the assignments was only half of it. The other half was leaning harder on the one thing AI structurally cannot provide: practitioners who have actually operated across the borders we study. The guest roster this quarter was deliberately international, and in nearly every case the speaker’s value was precisely the local, hard-won knowledge the models invent.

Emon Shakoor, who founded Blossom Accelerator (Saudi Arabia’s first tech-inclusion accelerator) came in early and set the tone: institutions and context decide who gets to build, in ways no amount of prompting will surface. Hans Tung of Notable Capital, a perennial name on the Forbes Midas List, walked the class through how he forms conviction on geographic bets (why the same idea is a different investment in Shanghai, São Paulo, or Menlo Park). Zhengzheng Pan, whose path runs from early Facebook through building Ads from China and leading Ant Group’s green-development work, closed our fundraising arc with how capital actually gets raised outside the US. Steve Schlenker of DN Capital brought the European venture lens, and Priyanka Ladha opened our sessions on India. In the Latin America session, Pedro Vallenilla and Arnoldo Gabaldón, the co-founders of Cashea, now Venezuela’s largest fintech, walked the class through building a consumer-credit business in a country where credit had essentially collapsed, alongside their early investor Iván Montoya of NuMundo Ventures. The African sessions ran two deep: Maxima Nsimenta, CEO of Livara, on building a consumer brand, then a panel of Africa-focused investors including Maya Famodu and Tawanda Sibanda. Throughout, my co-instructor Vimbayi Kajese anchored the inclusion session and the African ones. That roster makes “borders change the product” concrete in a way no slide, and no model ever could.

The lecture sequence shifted to match. I carved out a dedicated session on AI and value propositions early in the skills module, implications of AI for unit-economics in the middle, and a two-part fundraising arc closed out by a practitioner. The point of all of it was the same: by the time a team discovered their AI had confidently invented a regulatory regime in Korea or a procurement cycle in Ghana, they already had a felt sense of what real regional knowledge sounds like, and how different it is from fluent guessing.

What the teams built

The ventures were genuinely good, and notably international: a satellite venture untangling base-station siting and the rules for licensing spectrum across national jurisdictions; a cross-border payments company; a Thai rental-market app for handling lease agreements between landlords and tenants; a dating app; enterprise software for the Egyptian real estate market; and an AI matchmaking tool called Aunty AI. Several were two-sided marketplaces that learned they had to validate both sides before building anything.

But the part worth writing down isn’t the products. It’s what the teams learned about working with the machine, and arguing against it.

AI is a brilliant intern with no skin in the game

The single best line of the quarter came from a satellite team’s closing slide: “AI. A brilliant intern with no skin in the game.”

Their elaboration was sharper than most published commentary on the subject. AI was the best collaborator they had ever worked with and the most dangerous one, often inside the same conversation. It drafted their interview protocol in thirty seconds. It also confidently invented procurement cycles in Ghana, faculty budgets in Indonesia, and regulatory environments in Korea, fluently, without flinching, without ever once telling them it was guessing.

The hardest skill they developed, they said, wasn’t prompting. It was learning to recognize when an answer that sounded right was a hallucination dressed up convincingly. As they put it: AI cannot tell you you’re wrong about something it has no way of knowing, and it will never volunteer that it doesn’t know.

Another team compressed the entire risk into five words on a slide titled Notes to the Next Batch: “AI makes you faster at being wrong.” That is, I think, the most important sentence any of them produced. The danger of these tools is not that they’re unhelpful. It’s that they make the whole loop (including the loop where you sprint confidently in the wrong direction) run faster.

Borders change the product

The international framing of the course turned out to be the perfect stress test for AI’s limits, because local reality is exactly the thing a model trained on the global average gets wrong.

A cross-border payments team put it best: you cannot first-principles your way to the fact that Venezuela effectively runs on stablecoins as default cash, while India runs on UPI, Brazil on Pix, China on WeChat, and the US on cards. Those aren’t deductions. They’re facts you only get by being there, or by talking to someone who is. The same team concluded that in a market with multiple exchange rates and no neutral source of truth, the hardest thing to build wasn’t infrastructure — it was a number that people would trust enough to act on. Trust was the product.

A satellite team learned the same lesson at human scale: four direct messages with one person living in Korea fixed more than four weeks of desk research. Their advice to themselves, if they could start the quarter over: talk to one person who lives in the space before building anything. The cheapest insurance against being wrong about a place is knowing one person in it.

The thing that got more valuable, not less

Here is the synthesis I didn’t fully expect going in.

I worried that agentic AI would hollow out customer discovery, why call ten landlords when a model will simulate them instantly? The opposite happened. Because AI made personas, models, and prototypes nearly free, the scarce input became contact with reality. The teams felt this viscerally. The team building a Thai rental-agreement app started out designing for landlords only; it was customer interviews, not the AI, that revealed they had to build for tenants too, a finding the model never surfaced and, arguably, never could.

The most striking endorsement of this came from inside the machine. One team’s own AI assistant told them, in so many words, that five hours of customer calls was worth more right now than another five hours of brainstorming with it. When the intern tells you to go outside and talk to people, you should probably listen.

The students felt the same thing from the other direction. One wrote, reflecting at the end of the quarter, that the hardest lesson was realizing that identifying a problem and identifying a problem people will pay to solve are two very different things. Given the chance to start over, they’d have begun customer discovery earlier and treated it as the main work rather than something that supports the product, and measured validation by commitment, not compliments. A customer saying “this is interesting” is a very different thing from agreeing to a pilot, an introduction, or a payment.

So the value didn’t disappear. It migrated. It moved from producing the first draft, now a commodity, to the things AI structurally cannot do: holding skin in the game, sitting in the discomfort of a real customer’s contradiction, and exercising the judgment to know when a confident answer is hollow.

What I’m keeping

A few of the teams’ own Notes to the Next Batch are going straight into next year’s syllabus, because they’re better than mine:

• Be able to explain what you’re solving in a single sentence. (It’s harder than it sounds.)

• Interview the person who already failed at this.

• Figure out how to delegate and collaborate early, not in week nine.

• Some things you genuinely won’t know until you build.

• Go-to-market is a filter, not a formality.

• Let yourself be wrong out loud, the team was most useful to each other when someone said “I think we’re solving the wrong problem” before they had the perfect replacement ready.

The quarter was short, but as one team wrote, the pattern is long. As founders, and as faculty rebuilding courses on shifting ground, we’re going to keep being wrong about new things: about markets, about customers, about what AI can and cannot do. The point was never to get it right by Friday. The point is to get faster and more rigorous at noticing when we’re wrong.

The intern is brilliant. Keep it. Just don’t let it be your CEO.

The note we ended on

We closed the quarter on a subject that has nothing and everything to do with AI: how to last.

Agentic tools make velocity nearly free, and the hidden cost of free velocity is that it smuggles in the worst assumption of hustle culture, that the right speed is always faster. So the final session was about the founder, not the venture: trust as a moat, storytelling as how you earn it, and enough emotional self-awareness to keep good decisions from curdling into burnout. Vimbayi put the blunt version to the room — what are you chasing that’s worth an early death!? The companies still standing years from now will be the balanced ones. It’s a marathon you have to be alive (and in a good mental state) to finish.

That’s the through-line I want students to carry: institutions and culture shape who gets to build, entrepreneurship is a team sport, experiments beat opinions, and VC-backed hypergrowth is one path among many rather than the definition of success. Protect the asset, your health, your relationships, your identity outside the company, because the reason to build something is to be there for the part that matters. We sent them off with Apple’s old “Think Different” spot: an invitation to be the ones a little crazy enough to try.

Thanks to my co-instructor Vimbayi Kajese, course assistants Yikai Cao and Xinyu Chang, to all the mentors who gave their time to coach teams through the quarter, and the judges who sat through the final presentations and gave students candid, real-world feedback; and to every team, whose closing slides this essay shamelessly steals from.

If you want the research this redesign rests on: a field experiment by Nick Otis and colleagues on GenAI and entrepreneurial performance; a field experiment with Yanbo Wang showing that being randomly assigned an entrepreneur as a mentor measurably shifts students' career paths — especially for those whose parents weren't entrepreneurs; a randomized experiment with Lynn Wu on how mentor-network diversity and adaptability shape venture outcomes, and a study with Yong Suk Lee finding that university entrepreneurship programs don't necessarily produce more founders — but do help the founders they produce raise more capital and scale faster. The redesign is an attempt to act on both: better mentorship, and an education that makes the founders we do train measurably sharper.

Chuck Eesley

Discussion about this post

Ready for more?