Chuck Eesley

Ed Roberts

Chuck Eesley — Mon, 15 Jun 2026 15:01:14 GMT

This past weekend at Stanford, I watched another cohort of doctoral students hood. One of them was mine, and standing at the front while the chair read her name, I found myself thinking about my own PhD graduation at MIT, about Ed Roberts standing next to me that day, about my father in the same photograph, about the long arithmetic that has now placed me on the other side of that ceremony.

It is the closest I have come, since Ed died on February 27, 2024, to fully understanding what he was for.

He was my doctoral advisor at MIT Sloan. He was the founder of what is now the Martin Trust Center for MIT Entrepreneurship. He was the David Sarnoff Professor of the Management of Technology for most of my time as his student. Entrepreneurship as an academic field of course traces back further, to Schumpeter, to Drucker, to a long lineage of scholars who insisted there was something here worth studying. But high-technology entrepreneurship as a distinct empirical field of study really started with Ed. His 1991 book Entrepreneurs in High-Technology: Lessons from MIT and Beyond did, more than any other single work, the load-bearing job of moving the study of tech founders from collected anecdote to a real empirical field.

It has taken me more than two years to write anything public about him. Not because there hasn’t been something to say. Because there has been too much. And because watching one’s own students cross the same stage is, it turns out, the only place from which I knew how to write it.

I want to do three things in this piece. I want to tell you what kind of advisor he was. I want to widen the lens and talk about the intellectual lineage I inherited through him, both the methodology, through Jay Forrester and the founding of System Dynamics at MIT, and the formal economics lineage, through Franklin M. Fisher and the Tree of Zvi Griliches. And I want to be honest about what it has felt like, on the receiving end of that inheritance, to think about what it now obligates.

What it was like to be his student

MIT graduation, June 2009. From left: me, my father, and Ed Roberts.

The first thing to know is that Ed didn’t run a charismatic lab. He didn’t deliver thundering pronouncements. He worked from his office, with the door open, in conversations that sometimes lasted ten minutes and often lasted ninety, and that almost always left you with the same feeling: that the next move on your project had become clearer, and that he expected you to make it.

He had a quiet (and sometimes a LOUD) way of redirecting you when you were wrong without making it humiliating. He had an even quieter way of telling you when you were right, usually a sentence or two, then back to the next problem. He was generous with credit, careful with criticism, and unhurried in a way that I have come to appreciate more, not less, as I have run my own students.

I joined MIT early, in the summer of 2005, before my coursework officially started, specifically to work on the 2003 wave of Ed’s MIT alumni survey. That survey, in the form Ed had been running it since the 1980s, was already a serious longitudinal instrument. It was about to become the basis of the 2009 Kauffman / MIT alumni-impact study (the one that estimated companies founded by MIT alumni would, if grouped as an independent nation, generate roughly $2 trillion in annual worldwide sales: the eleventh-largest economy in the world at the time). Working on that 2003 wave in the summer of 2005 is how I learned what an alumni-entrepreneur survey actually was.

The decision that compounded the most for my career,, though neither of us would have called it that at the time, was Ed’s role in making the Tsinghua alumni survey possible. He suggested the project in 2006. Then he raised the funding for it himself, with a single phone call to Charles Zhang (张朝阳), the Tsinghua-educated founder of Sohu.com, whom Ed had backed years earlier as the first angel check into what became Sohu. Charles in turn donated the funds to Tsinghua to support the survey. We launched the instrument in 2007.

Ed had been hosting Delin Yang and many other Chinese visiting scholars at MIT for years before that, so the institutional groundwork between MIT Sloan and Tsinghua was already in place. The Tsinghua connection set the trajectory for most of my subsequent research, and it is the reason I have spent so much time in China since. Years later I had the chance to interview Charles Zhang myself, alongside Yanbo Wang, closing a small loop that had started with a phone call Ed made in 2006.

The arc of my own work, pioneering large-scale alumni-entrepreneur surveys as a primary research instrument, runs directly out of Ed’s office between 2005 and 2007. Once I was on the Stanford faculty I extended the methodology, with Bill Miller, into the Stanford Alumni Innovation Survey. That instrument in turn inspired several others, Mike Lenox and Andy King’s UVA Darden study (2014), which I worked on; the University of Toronto’s alumni survey; and several more I had no direct hand in. None of any of it happens without that 2006 phone call.

That is the thing about good advisors. They do not tell you what your career is going to be. They put you in the room where it could happen, and they get out of the way.

Defending the dissertation at MIT Sloan, 2009. Ed across the table, just where he had always been.

What he built

Ed had a foot in all three worlds: academic, entrepreneur, and philanthropic. He insisted on all three at once for sixty years. Of all the things I find myself emphasizing to my own students about his career, that breadth keeps moving up the list as I get older, because it is what set him apart from almost everyone else who ever studied entrepreneurship.

In the 1960s, while still a junior faculty member at Sloan, he co-founded Pugh-Roberts Associates, a firm that took system-dynamics consulting to government and private-sector partners and that still exists today as the Sage Analysis Group. A few years later he co-founded Medical Information Technology, Inc. (MEDITECH), which became one of the most durable health-care IT firms in the country. In the 1980s and 1990s he co-founded a series of Zero Stage Capital VC funds, putting early-stage money into MIT-adjacent technology startups long before that model was standard. By the time he was finished, per MIT’s accounting, Ed had co-founded 14 companies and served on the boards of more than 40. The first angel check into what later became Sohu.com, the same investment in Charles Zhang that, decades later, made our Tsinghua alumni survey possible was Ed’s.

The institution-building was the same instinct deployed against MIT itself. The MIT Entrepreneurship Center, which Ed founded in 1990, did not exist in any meaningful form before him. Neither did the MIT Sloan Management of Technology (MOT) program he chaired, nor the MIT Sloan Entrepreneurship and Innovation Certificate program he co-created. He spent half a century quietly turning MIT into the entrepreneurial place we now take it to be. As MIT President Sally Kornbluth wrote the day he died, “It is not too much to say that MIT’s flourishing entrepreneurial culture and global reputation as a source of influential start-ups grew from seeds Ed planted here 50 years ago.” His successor at the Martin Trust Center, Bill Aulet, put it more directly: “Virtually everything today in the MIT entrepreneurial ecosystem, from classes to extracurricular activities, has some level of Ed’s DNA at its core.”

The philanthropy was less visible but equally serious. Ed inspired and funded — directly, indirectly, and through the institutions he built, generations of MIT alumni who went on to start companies, support causes, and back the next generation in turn. The arithmetic of that giving compounds the same way the academic genealogy does: each founder Ed helped become possible has, in many cases, gone on to make others possible. The Charles Zhang phone call that paid for the Tsinghua survey is one small data point in a very long series.

The thing that strikes me, looking at all of that together, is how unified it was. The advising, the research, the company-building, the institution-building, the philanthropy, they were not separate pursuits Ed happened to also do. They were one project, in three worlds. A scholar of entrepreneurship who also practiced it and funded it could see things about entrepreneurship that a scholar who only studied it could not. Ed insisted on standing in all three at once, and the field is what it is because he did.

What he came from

To talk about Ed honestly, you have to talk about the intellectual lineage he came from, because he did not invent himself out of nothing and neither do any of us. The lineage runs in two directions, both of which matter.

The methodological line runs through Jay W. Forrester, who founded the field of System Dynamics at MIT in the late 1950s. Forrester built the toolkit for rigorous quantitative modeling of complex industrial and social systems. Ed was the person who first applied that toolkit at dissertation depth and length: his 1962 PhD work, The Dynamics of Research and Development, was MIT’s first doctoral dissertation in system dynamics, applying Forrester’s methods to the management of R&D. Ed spent the next six decades pulling that approach into the study of management, technology, and entrepreneurship. Reading his own 2007 personal memoir of those years, it is hard to miss how seriously he took the obligation of making system dynamics useful.

The formal lineage runs through the MIT Department of Economics. Ed’s PhD was in economics in 1962, and his advisor of record was Franklin M. Fisher — the eminent MIT econometrician who would later chair the department, lead-witness the federal government’s IBM antitrust case, and write the central textbook on econometric identification problems. Fisher had earned his own PhD at Harvard in 1960, and he appears in Iain Cockburn’s published Tree of Zvi, the genealogy of Zvi Griliches, the Harvard economist who taught generations of scholars (Bronwyn Hall, Adam Jaffe, Manuel Trajtenberg, Josh Lerner, Scott Stern, Rebecca Henderson, and many others) to treat innovation, R&D, and entrepreneurship as measurable economic phenomena rather than ineffable acts of vision — as one of Griliches’s associates and coauthors.

That makes my own line, if I trace it formally:

Chuck Eesley → Ed Roberts → Franklin M. Fisher → Zvi Griliches.

I confess I had not really registered this until recently. I had thought of the Griliches tradition as the broader empirical context inside which my work sits, important, but parallel, not direct. The dissertation record makes the connection direct. Through Fisher, I am genuinely on the tree.

What this means is that I inherited two things from Ed at once. The system-dynamics methodology, the temperament of identifying an important question first, then building measurement apparatus and letting it push the theory, came from the Forrester line. The empirical-economics commitment to treating innovation and entrepreneurship as measurable, contestable, falsifiable phenomena came from the Fisher / Griliches line. Both are present in the MIT alumni surveys themselves, which are simultaneously a system-dynamics instrument (a measurement apparatus for a complex object) and a Griliches-style empirical innovation study (large-N, longitudinal, structured to support causal claims). Ed inherited both and combined them. I inherited the combination.

Ed and Zvi worked in different fields and on different problems, but barely a mile and a half apart, Ed at MIT, Zvi at Harvard, both Cambridge, both within walking distance of each other along Massachusetts Avenue. What I inherited from each was a common permission: that the things worth studying about entrepreneurship and innovation are the things that can actually be measured at scale and tested. Not because the unmeasurable doesn’t matter, it does; but because someone has to do the patient work of building the empirical floor under our claims. Without that floor, the field reverts to anecdote. Ed and Zvi, in different rooms but the same town, were both building that floor.

What I think this lineage actually is

Academic genealogies, Ed from Forrester, my methodology from the Griliches tradition, sound, when you describe them, like a kind of prestige claim. Look at the names in my line. I want to say very clearly: that is not what this is for.

What a lineage is for is to make obligations visible.

If you are descended, intellectually, methodologically, or as a person from someone who built something that compounds (a measurement instrument, a research tradition, a department, a way of asking questions), then you are not really inheriting the prestige. You are inheriting the responsibility. The most honest thing you can do with that inheritance is push it forward, slightly improved, for someone else.

For me, that has looked like this. Eleven former doctoral students have come out of my group; nine of them hold tenure-track faculty positions today — at Columbia, Oxford, Carnegie Mellon, Johns Hopkins, the National University of Singapore, the Chinese University of Hong Kong, the University of Oregon, the University of San Diego, and Pontificia Universidad Católica de Chile. Several of them have started their own doctoral students. Some of those students will themselves go on to advise.

I do not list any of this to put placement statistics in print. I list it because that is what Ed did. He advised. His students advised. His students’ students are now advising. The arithmetic is long and slow. None of it would have happened without the door he kept open in his office. Now I owe my own students the same door, and the same patience, and the same trust that they will go and do something that compounds again.

What I would say to him now

The last time I saw Ed in person was August 8, 2023. He and Nancy hosted us at their summer home in New Hampshire, and that evening they took us to a concert at the venue Ed sat on the board of. He was, in the way I always remembered him, gracious and curious, interested in my research, in our students, in the family foundation Lijie and I had just started, asking after specific people whose names he had no reason to remember and remembered anyway. We did not know, that evening, that we were saying goodbye. I am very glad we got to see him there.

With Ed and Nancy Roberts at their summer home in New Hampshire, August 8, 2023 — the last time I would see Ed.

I would tell him that the alumni-survey method he handed me has now been extended to Stanford, to Virginia, to Tsinghua, and that the data is producing the kind of empirical claims about the role of universities in the economy that we both used to argue mattered. I would tell him that nine of eleven of my former students are on the tenure track. I would tell him that the MIT 2009 paper he and I wrote together, and its subsequent 2015 update by Fiona Murray and Daniel Kim finding that MIT alumni had launched 30,200 active companies generating roughly $1.9 trillion in annual revenues, is one of the most consequential things I have ever put my name on, and that I understand more clearly now than I did then that the chunk he handed me was a gift.

And I would tell him that I am trying, every week, in conversations with my own students that sometimes last ten minutes and sometimes ninety, to do for them what he did for me. To make the next move clearer, and to trust them to make it.

Ed Roberts died on February 27, 2024. The work he started keeps going. That, I have come to think, is what the long arithmetic of advising is actually for.

A Brilliant Intern With No Skin in the Game

Chuck Eesley — Tue, 09 Jun 2026 15:55:57 GMT

This spring I treated my own course as a field experiment.

MS&E 272, Entrepreneurship Without Borders, has always asked students to do the unglamorous core work of starting a company: find a real problem, talk to real customers, narrow to something buildable, and prove that someone, somewhere, will commit. What changed this year is that every student walked in with a tireless research analyst, market-sizer, interview-protocol writer, and prototype-builder already in their pocket.

That forced a question I couldn’t teach around: when the first draft of everything is suddenly free, what is actually left for a founder to do?

So I rebuilt the course to answer it.

The redesign: require AI, then require breaking it

The lazy options were to ban AI or to wave it through. I did neither. Every team had to use agentic tools to generate personas, size markets, draft interview protocols, build prototypes — and then every assignment carried a second requirement: take that output into the world and try to falsify it. Four design choices made that rhythm real, because a principle that isn’t in the rubric is just a vibe.

I graded the prompt log, not just the output. Once AI makes the polished artifact nearly free, the artifact stops being good evidence of thinking. So the prompt became the thing I assessed, the sequence of questions a team asked, where they pushed back on the model, where they caught it. The reasoning trail is now the deliverable.

Verification became a professional habit, not a disclaimer. Every AI-generated fact had to be checked against a primary source. I was most aggressive about this on geography: these models carry a heavy US/Western prior, which is poison for a cross-border course. Part of the graded work was explicitly catching and correcting that prior, the moment the model assumes Plaid-style banking APIs exist, or that a “no” means the same thing in Jakarta as in Palo Alto.

The rubric used an explicit evidence hierarchy, and it protected customer discovery from automation. This was the change I’d defend hardest. The temptation in the agentic era is to let students “interview” synthetic personas and have AI summarize what customers probably think. So I ranked evidence: AI-synthesized interviews sit at the bottom; real behavioral commitments: money, time, social capital, a signed LOI became are the gold standard. Skin in the game, not expressed interest. The rubric rewards evidence depth far more than polish, precisely because AI makes polish free. (I worked out the underlying argument, why customer evidence has quietly become cheap talk, and the shift from AI as generator to AI as verifier, in an earlier piece here.)

More weight on problem-finding than problem-solving. AI is excellent at solving a crisply stated problem and terrible at telling you you’re solving the wrong one. So the framing, can you state what you’re actually solving, in one sentence? now carried more of the grade than the solution.

The pattern was baked in from the very first assignment. Before a team had even locked its problem, it had to put a generative model to work as a thinking partner — map the five most underserved sub-problems in its space outside the US, then make it play a skeptical investor who had seen 200 pitches and name the three reasons the idea would fail. But the graded core wasn’t the model’s output. It was an individual reflection in which each student had to point to exactly where the AI hallucinated, oversimplified, or tipped its hand toward Western, US-centric markets, plus an interview target list of at least five real, named people the team would actually contact.

The centerpiece assignment made the order explicit. Customer discovery was structured as a five-step workflow that deliberately sandwiched human contact between two layers of AI. Teams built personas and an interview guide with AI; then practiced against AI-simulated personas to find the weak questions before spending a real person’s time; then ran five to ten interviews with actual humans, the step the assignment flatly calls the irreplaceable core; then fed the transcripts back to AI for pattern-finding while forcing it to steelman the opposite conclusion (make the case that this problem is not worth solving) and to name what it might be misreading; and only then sized the market, with every figure traceable to a named primary source. AI prepared the conversation and synthesized it afterward. It was never allowed to be the conversation.

By the unit-economics assignment later on, the verification rule had hard teeth: teams modeled LTV and CAC with AI-surfaced benchmarks, but any number that traced back only to the model, with no primary source behind it, earned no credit. Across all of these, a one-page AI prompt log was a standard deliverable, and the rubric paid points for catching the AI’s mistakes, not for hiding them. The polish was free; the credit was for the verification.

One more mechanic tied it together. In the final presentations I added a required slide that turned out to be the most interesting thing all quarter: show us where AI got it wrong, once you’d done your real interviews. That single prompt is where the best lines in this essay came from.

The design principle, which one team stated more cleanly than my syllabus did: treat every output as something to verify, not accept.

The room was the counterweight

Reworking the assignments was only half of it. The other half was leaning harder on the one thing AI structurally cannot provide: practitioners who have actually operated across the borders we study. The guest roster this quarter was deliberately international, and in nearly every case the speaker’s value was precisely the local, hard-won knowledge the models invent.

Emon Shakoor, who founded Blossom Accelerator (Saudi Arabia’s first tech-inclusion accelerator) came in early and set the tone: institutions and context decide who gets to build, in ways no amount of prompting will surface. Hans Tung of Notable Capital, a perennial name on the Forbes Midas List, walked the class through how he forms conviction on geographic bets (why the same idea is a different investment in Shanghai, São Paulo, or Menlo Park). Zhengzheng Pan, whose path runs from early Facebook through building Ads from China and leading Ant Group’s green-development work, closed our fundraising arc with how capital actually gets raised outside the US. Steve Schlenker of DN Capital brought the European venture lens, and Priyanka Ladha opened our sessions on India. In the Latin America session, Pedro Vallenilla and Arnoldo Gabaldón, the co-founders of Cashea, now Venezuela’s largest fintech, walked the class through building a consumer-credit business in a country where credit had essentially collapsed, alongside their early investor Iván Montoya of NuMundo Ventures. The African sessions ran two deep: Maxima Nsimenta, CEO of Livara, on building a consumer brand, then a panel of Africa-focused investors including Maya Famodu and Tawanda Sibanda. Throughout, my co-instructor Vimbayi Kajese anchored the inclusion session and the African ones. That roster makes “borders change the product” concrete in a way no slide, and no model ever could.

The lecture sequence shifted to match. I carved out a dedicated session on AI and value propositions early in the skills module, implications of AI for unit-economics in the middle, and a two-part fundraising arc closed out by a practitioner. The point of all of it was the same: by the time a team discovered their AI had confidently invented a regulatory regime in Korea or a procurement cycle in Ghana, they already had a felt sense of what real regional knowledge sounds like, and how different it is from fluent guessing.

What the teams built

The ventures were genuinely good, and notably international: a satellite venture untangling base-station siting and the rules for licensing spectrum across national jurisdictions; a cross-border payments company; a Thai rental-market app for handling lease agreements between landlords and tenants; a dating app; enterprise software for the Egyptian real estate market; and an AI matchmaking tool called Aunty AI. Several were two-sided marketplaces that learned they had to validate both sides before building anything.

But the part worth writing down isn’t the products. It’s what the teams learned about working with the machine, and arguing against it.

AI is a brilliant intern with no skin in the game

The single best line of the quarter came from a satellite team’s closing slide: “AI. A brilliant intern with no skin in the game.”

Their elaboration was sharper than most published commentary on the subject. AI was the best collaborator they had ever worked with and the most dangerous one, often inside the same conversation. It drafted their interview protocol in thirty seconds. It also confidently invented procurement cycles in Ghana, faculty budgets in Indonesia, and regulatory environments in Korea, fluently, without flinching, without ever once telling them it was guessing.

The hardest skill they developed, they said, wasn’t prompting. It was learning to recognize when an answer that sounded right was a hallucination dressed up convincingly. As they put it: AI cannot tell you you’re wrong about something it has no way of knowing, and it will never volunteer that it doesn’t know.

Another team compressed the entire risk into five words on a slide titled Notes to the Next Batch: “AI makes you faster at being wrong.” That is, I think, the most important sentence any of them produced. The danger of these tools is not that they’re unhelpful. It’s that they make the whole loop (including the loop where you sprint confidently in the wrong direction) run faster.

Borders change the product

The international framing of the course turned out to be the perfect stress test for AI’s limits, because local reality is exactly the thing a model trained on the global average gets wrong.

A cross-border payments team put it best: you cannot first-principles your way to the fact that Venezuela effectively runs on stablecoins as default cash, while India runs on UPI, Brazil on Pix, China on WeChat, and the US on cards. Those aren’t deductions. They’re facts you only get by being there, or by talking to someone who is. The same team concluded that in a market with multiple exchange rates and no neutral source of truth, the hardest thing to build wasn’t infrastructure — it was a number that people would trust enough to act on. Trust was the product.

A satellite team learned the same lesson at human scale: four direct messages with one person living in Korea fixed more than four weeks of desk research. Their advice to themselves, if they could start the quarter over: talk to one person who lives in the space before building anything. The cheapest insurance against being wrong about a place is knowing one person in it.

The thing that got more valuable, not less

Here is the synthesis I didn’t fully expect going in.

I worried that agentic AI would hollow out customer discovery, why call ten landlords when a model will simulate them instantly? The opposite happened. Because AI made personas, models, and prototypes nearly free, the scarce input became contact with reality. The teams felt this viscerally. The team building a Thai rental-agreement app started out designing for landlords only; it was customer interviews, not the AI, that revealed they had to build for tenants too, a finding the model never surfaced and, arguably, never could.

The most striking endorsement of this came from inside the machine. One team’s own AI assistant told them, in so many words, that five hours of customer calls was worth more right now than another five hours of brainstorming with it. When the intern tells you to go outside and talk to people, you should probably listen.

The students felt the same thing from the other direction. One wrote, reflecting at the end of the quarter, that the hardest lesson was realizing that identifying a problem and identifying a problem people will pay to solve are two very different things. Given the chance to start over, they’d have begun customer discovery earlier and treated it as the main work rather than something that supports the product, and measured validation by commitment, not compliments. A customer saying “this is interesting” is a very different thing from agreeing to a pilot, an introduction, or a payment.

So the value didn’t disappear. It migrated. It moved from producing the first draft, now a commodity, to the things AI structurally cannot do: holding skin in the game, sitting in the discomfort of a real customer’s contradiction, and exercising the judgment to know when a confident answer is hollow.

What I’m keeping

A few of the teams’ own Notes to the Next Batch are going straight into next year’s syllabus, because they’re better than mine:

• Be able to explain what you’re solving in a single sentence. (It’s harder than it sounds.)

• Interview the person who already failed at this.

• Figure out how to delegate and collaborate early, not in week nine.

• Some things you genuinely won’t know until you build.

• Go-to-market is a filter, not a formality.

• Let yourself be wrong out loud, the team was most useful to each other when someone said “I think we’re solving the wrong problem” before they had the perfect replacement ready.

The quarter was short, but as one team wrote, the pattern is long. As founders, and as faculty rebuilding courses on shifting ground, we’re going to keep being wrong about new things: about markets, about customers, about what AI can and cannot do. The point was never to get it right by Friday. The point is to get faster and more rigorous at noticing when we’re wrong.

The intern is brilliant. Keep it. Just don’t let it be your CEO.

The note we ended on

We closed the quarter on a subject that has nothing and everything to do with AI: how to last.

Agentic tools make velocity nearly free, and the hidden cost of free velocity is that it smuggles in the worst assumption of hustle culture, that the right speed is always faster. So the final session was about the founder, not the venture: trust as a moat, storytelling as how you earn it, and enough emotional self-awareness to keep good decisions from curdling into burnout. Vimbayi put the blunt version to the room — what are you chasing that’s worth an early death!? The companies still standing years from now will be the balanced ones. It’s a marathon you have to be alive (and in a good mental state) to finish.

That’s the through-line I want students to carry: institutions and culture shape who gets to build, entrepreneurship is a team sport, experiments beat opinions, and VC-backed hypergrowth is one path among many rather than the definition of success. Protect the asset, your health, your relationships, your identity outside the company, because the reason to build something is to be there for the part that matters. We sent them off with Apple’s old “Think Different” spot: an invitation to be the ones a little crazy enough to try.

Thanks to my co-instructor Vimbayi Kajese, course assistants Yikai Cao and Xinyu Chang, to all the mentors who gave their time to coach teams through the quarter, and the judges who sat through the final presentations and gave students candid, real-world feedback; and to every team, whose closing slides this essay shamelessly steals from.

If you want the research this redesign rests on: a field experiment by Nick Otis and colleagues on GenAI and entrepreneurial performance; a field experiment with Yanbo Wang showing that being randomly assigned an entrepreneur as a mentor measurably shifts students' career paths — especially for those whose parents weren't entrepreneurs; a randomized experiment with Lynn Wu on how mentor-network diversity and adaptability shape venture outcomes, and a study with Yong Suk Lee finding that university entrepreneurship programs don't necessarily produce more founders — but do help the founders they produce raise more capital and scale faster. The redesign is an attempt to act on both: better mentorship, and an education that makes the founders we do train measurably sharper.

We weren't ready. We started anyway.

Chuck Eesley — Mon, 08 Jun 2026 22:10:19 GMT

Five years ago this past January, Lijie and I filed the paperwork for the Zhou & Eesley Family Foundation. We were still in the middle of the pandemic. We were both working full-time jobs. We had never run a foundation before.

What we had was time. Specifically, the strange in-between time the pandemic created: months when commutes vanished, evenings got longer, and you could spend a weekend reading IRS Publication 557 cover to cover. We used some of that time to do the research and paperwork it takes to start a private foundation.

I had always assumed we would wait. I think most people assume they will wait. The standard story is that you accumulate, you build, and at some later point, maybe retirement, you turn around and start giving back. We had the same story in mind, more or less, until early 2020 made the need impossible to ignore and the calendar impossible to fill any other way.

We started before we were ready. Five years in, it is the best decision we have made.

What we set out to do

The Foundation is small. By the end of 2025, total assets sat at $1.89 million, with 14% deployed into mission-related and program-related investments and the rest funding our direct programming. We are not a Gates. We are not a Hewlett. We are two people running an operation on nights and weekends, alongside an advisory board that does the work of holding us accountable.

What we set out to do was simple to say and harder to do: support computer science and entrepreneurship education in communities that mainstream programs overlook, with particular focus on women, underrepresented founders, and ventures advancing the UN Sustainable Development Goals.

What we’ve actually done

Five years on, we have worked in nine communities. San Francisco State University, where Lijie’s alumna ties opened the very first partnership. Molokai. Taiwan. Vietnam. Malaysia. Tanzania. Uganda. Paris. And the MIT MEET program, which serves Israeli and Palestinian students together.

Our 2025 work at the Penang Science Cluster in Malaysia reached roughly 2,500 students through a 100:1 teacher-to-student multiplier. That number has taught us as much about leverage as anything else we have done. Uganda is a hybrid. The refugee-entrepreneur research there is my own Stanford project, but the Foundation came in alongside it, funding microloans and continued support for a subset of participants through a collaboration with Challenges Uganda. In Tanzania, our work with the LOHADA children’s home started through a former student and grew into a multi-month relationship. At one point we bought a local painting and auctioned it back home, raising $1,854 toward a tractor for the home. Most of what we do looks like those two: long-running and hands-on.

On the investment side, we hold eleven active positions: nine mission-related and two program-related. They range from Synchron’s brain-computer interfaces (restoring digital autonomy after paralysis) to Oze’s fintech for West African small businesses to Empo Health’s early detection of diabetic foot ulcers. That last one is personal. I lost my father to complications from a diabetic foot ulcer in 2015.

What we would do differently

We started as a non-operating foundation. About two years in, we converted to an operating foundation. We should have started that way.

The difference is technical in name and practical in everything else. A non-operating foundation primarily makes grants to other charitable organizations. An operating foundation directly runs its own programs. The IRS rules, the payout requirements, and the daily work are all different.

We thought we wanted to be grantmakers. What we wanted, it turned out, was to teach in Molokai and Penang and Tanzania ourselves, to run teacher-training cohorts with our hands on the materials, to build long relationships with partners where we were not the funder of record but a co-laborer. That is operating-foundation work. We backed into it without naming it for two years, and the structure we had was not built for it.

If you are thinking about starting a foundation and you know in your bones that you want to do the work, not just fund it: start as an operating foundation. Save yourselves the conversion.

What we got right

The best structural decision we made on day one was the independent directors / advisory board.

We could have run this as a two-person operation. We are a two-person operation. But from the beginning, we invited a small group of people whose judgment we respected and who had no incentive to flatter us, and we asked them to tell us when our decisions looked wrong. We did not want advisory in the polite sense. We wanted people who would stop us from doing things we would regret, and more than once they have. We gave them voting rights and made them full voting members of the board.

I think small foundations underuse this. We tend to think of accountability as a thing imposed on large institutions. But a small foundation runs into the same cognitive traps as a large one: falling in love with your own pet partnerships, overweighting recent wins, expanding too fast, underestimating the operational cost of a new commitment. Having people who can say no, not that one, not yet has been worth more than the time it costs to brief them.

What surprised us

Three things, briefly.

First, the multiplier of training teachers rather than teaching students directly is much larger than we expected. A teacher we train will reach roughly a hundred students. A student we teach reaches one student. We knew this in principle. Watching it work in Penang reset our default question from “how many students did we reach?” to “how many teachers are now running this on their own?”

Second, mission-related investments turn out to feel less like venture deals and more like the program partnerships. We thought we were building two parallel portfolios, programs on one side and investments on the other. They turn out to be the same thing in different forms: multi-year relationships with people doing the work in places we care about.

Third, the part-time constraint is more of a feature than a bug. We cannot spread ourselves across thirty geographies, so we have learned to say no to most things and yes to a few long ones. A foundation we ran full-time would, I suspect, do less good and feel less alive.

What we’re trying next

The next five years: deeper teacher cohorts in places we already work, careful expansion only where there is a clear long-term reason to be there, more program-related investments (including equity, not just the below-market loans we have made so far), and a real effort to measure whether the work moves outcomes. The Uganda microloan collaboration, paired with a research study on the same population, is a first step toward that kind of evidence.

If you have been waiting

If you are reading this and thinking about starting something, whether a foundation, a fund, or a recurring small commitment, and your story is “I’ll do this later, when I’m more ready”:

Start now. You will not be more ready. You will know more later, but the learning is the work, and you can only do the learning by being in motion. You can do it part-time. You can do it small. You can do it badly at first and improve as you go. The thing you cannot do is do it in the future you keep promising yourself.

Five years in, the most surprising thing about doing this is how much we look forward to it. We started this thinking it would be the work of our retirement. It turned out to be the work that makes the rest of the work feel like it adds up to something. We are very glad we did not wait.

Lijie’s reflection from the same five years is here. She tells a different version of this story.

Theo Baker’s Stanford Is Real. It Just Isn’t Most of Stanford.

Chuck Eesley — Tue, 26 May 2026 00:43:16 GMT

My wife Lijie published a piece this week reacting to Theo Baker’s new book about Stanford. She made a point I won’t try to remake: that elite access is real, and the more important question is what people do with it. I’d encourage you to read her piece. She argues better than I could that service is the better return on the kind of access Stanford provides.

I want to add the empirical companion. Because I've been watching the press cycle around Baker's book with one question on my mind: what about the other 97%?

Baker is a real journalist. He earned a Polk Award as a freshman, broke the story that ended a Stanford president’s tenure, and has written a vivid book about what he saw during four years embedded inside the institution. The set pieces are sharp. An uncredited secret seminar taught by a Silicon Valley CEO. Freshmen courted with caviar dinners by venture capitalists. An “incubator with dorms” where talent is sniffed out at orientation. None of it is invented. The seminar exists, the dinners happen, the pattern is real.

To his credit, Baker doesn't claim this is most of Stanford. He's explicit that the freshmen flagged for unicorn potential — what he calls the Plucked — are a small subset, and part of what makes the access feel illicit is that it isn't widely distributed. The empirical question I want to add is what happens to everyone else: the rest of any given Stanford graduating class who don't get courted with caviar, who never see the secret seminar, and who are largely missing from both the press coverage and the policy imagination that runs off it.

I’ve spent fifteen years studying this. With the late William F. Miller, former Stanford Provost, I built a multi-decade dataset of Stanford alumni-founded companies. Every cohort, every venture identifiable, longitudinal data on outcomes. The aggregate numbers are familiar by now: roughly 40,000 active companies tracing back to Stanford, $2.7 trillion in annual revenue, the rough equivalent of a top-ten national economy if the alumni cohort were a country. The 40,000-company figure appears in Baker's book, where he attributes it to "a 2011 study." That study, with Miller, is the one I've been describing. The revenue and economy framings appear in roughly every press release Stanford has issued about its role in Silicon Valley over the last decade.

What the dataset also lets us see is the distribution. And the distribution doesn’t match the headline.

The average Stanford alumnus who became a founder started their first company roughly ten years after graduation. Not at orientation. Not as an undergraduate. About a decade out, typically after working somewhere else first. Most never appeared in any “secret seminar.” Most got their first investors through normal channels: a former classmate, a faculty contact, a Series A pitch against fifty other startups. Most worked for someone else first, sometimes for a decade, before founding anything.

About 8 percent of Stanford alumni founders started their first company within a year of graduation. About 28 percent started within five years. The remaining 72 percent waited longer than that. The average gap between graduation and first founding was roughly ten years.

The ‘quick founder’ archetype Baker writes about — the freshman or new graduate building a billion-dollar company — is the closest analog to his subjects. It’s also a small minority of an already-small subpopulation of Stanford alumni. Maybe two or three percent of any given graduating class. They’re the ones the press writes about. They are not most Stanford founders. They are an interesting subpopulation, not the population.

This matters because the policy stakes of getting it right are large. Every government on earth is currently trying to engineer a version of Silicon Valley. The Inflation Reduction Act, the CHIPS Act, EU innovation programs, China’s NEV subsidies, every American state’s “Silicon Valley of [X]” initiative. The mental model running through these efforts is something like Baker’s: elite institution plus ambient venture capital plus secret networks equals founders. If that’s right, the policy problem is to recreate the elite institution and the venture capital, and the founders will follow.

The data suggest something more specific.

In work with Yong Suk Lee, published in the Strategic Management Journal, we used a quasi-experimental approach to estimate what Stanford’s main entrepreneurship programs actually do. The Center for Entrepreneurial Studies at the Business School and the Stanford Technology Ventures Program at the Engineering School were introduced at different times in the mid-1990s, which let us compare cohorts who had access to each program against those who didn’t. The finding is counterintuitive in a way that almost no one expects.

These general programs do not appear to increase the rate of entrepreneurship. In some specifications, participation in the Business School program is associated with a roughly 35 percent decrease in the likelihood of starting a company. But the startups that do emerge perform better. Lower failure rates, higher revenues. The mechanism appears to be informational. Students learn enough about what entrepreneurship actually requires to figure out whether it’s a good fit for them. A meaningful share, having learned that, decide it isn’t. The remaining founders are better-matched, better-prepared, and produce better outcomes.

That’s a different story from “entrepreneurship can be taught.” It’s closer to “entrepreneurship can be evaluated,” and the institutional mechanisms that produce good evaluation are not the same ones that produce hype.

Selective, pre-venture programs look different again, and this is where the strongest causal evidence now sits. In a paper currently under review at Management Science, my co-authors Stefan Weik, Michael Fröhlich, Aaron Defort, Isabell Welpe and I study the Center for Digital Technology and Management — CDTM — a selective pre-entrepreneurship program in Munich that admits roughly 25 students per cohort from several hundred applicants. CDTM ranks applicants by composite interview scores with a sharp capacity cutoff. Candidates just above the cutoff get in; candidates just below mostly don’t. Their interview scores are nearly identical. The design lets us compare what happens to functionally equivalent people on opposite sides of an arbitrary line — the cleanest test currently available of whether selective programs cause high-quality founding or merely select the talented who would have succeeded anyway.

Three findings matter for the present debate. First, program participation more than doubles the founding rate, and the entire effect is concentrated in high-growth ventures. The probability of raising $10 million or more in venture capital rises from 0.7% in the control group to roughly 9% among participants — an order-of-magnitude shift. There is essentially zero effect on low-growth or lifestyle ventures. Second, the mechanism is not what most people guess. Program grades do not strongly predict whose ventures succeed. What predicts success is the thinness of a participant’s pre-existing network: engineering and computer science students, who arrive with fewer entrepreneurial connections, benefit substantially; business students, who arrive better connected to the relevant capital and talent pools, show essentially no treatment effect. Third, roughly 73% of participant co-founding relationships form across cohorts rather than within them, and 23% of participant founders receive early funding from program alumni acting as angel investors. The program is functioning as a multi-year matching market, not a curriculum.

Two pieces of context matter. CDTM operates in Munich — outside the Bay Area, outside the established VC ecosystem. The mechanism transfers. And the mechanism itself isn’t ambient Silicon Valley magic. It is a specific, designed institutional structure: competitive admission, cross-disciplinary cohorts, sustained multi-cohort alumni networks that act as both co-founder pools and informal capital. Stanford’s Mayfield Fellows Program shares these features. The Instagram co-founding story — Kevin Systrom and Mike Krieger from different Mayfield cohorts, connected through the program’s network — is the same cross-cohort matching pattern the CDTM data identifies more rigorously. Two independent settings, with very different identification quality, point in the same direction.

The Stanford effect, in other words, is not produced by ambient ecosystem magic. The Stanford effect, to the extent we can measure it causally, appears to be produced by specific, identifiable, replicable institutional mechanisms.

There’s also a timing problem with Baker’s account that I think gets undersold. Baker was a freshman in fall 2022. His four years at Stanford coincided exactly with the most extreme AI funding cycle in technology history. The pattern of VCs paying freshmen to drop out, courting eighteen-year-olds with model dinners, treating Stanford as a unicorn-spotting frontier intensified dramatically during the GPT-3-to-GPT-5 window. It’s real, but it’s also a peak-moment phenomenon, not a steady-state condition. Cohorts from 2008, or 1998, or 1988 had different experiences because the environment around them was different. The Stanford the press is currently scrutinizing is partly Stanford and partly the AI boom that happened to coincide with Baker’s enrollment.

This is the kind of distinction the longitudinal data and design-based evidence make legible and journalism mostly can't. A journalist describes the Stanford he saw. A researcher with forty years of cohorts and a regression discontinuity can say which features of that Stanford are durable institutional patterns and which are products of the specific moment in which the journalist observed.

None of this is a defense of the seminar, or the dinners, or the broader pattern Baker is right to find unsettling. It’s not a defense of Stanford either. It’s the longer-horizon version of the same observation. The reason most Stanford alumni who became founders don’t look like Baker’s subjects isn’t that Stanford lacks the elite-access world he describes. It’s that the elite-access world is much smaller than the headline implies, and most of the institutional work that actually produces founders happens elsewhere. In less photogenic places, on longer timescales, through mechanisms that don’t make for vivid scene-setting.

Baker has written the book about the part of Stanford that's easiest to see. With the data and the causal evidence, we can also describe the part that's harder to see but does more of the work. Both are true. Both are worth understanding.

If you’re interested in which institutional mechanisms, at Stanford and elsewhere, actually move the needle on who becomes a founder, I’ll be writing more about that here over the coming months. Lijie and I are also working on this through our Foundation, applying what we’ve learned to settings where the resources are scarce and the leverage is high. If you want to follow that work, her piece is the place to start.

Stanford is real, the access is real, and the question of what it's for is the right question to ask. The evidence suggests that most of what Stanford produces is built through more ordinary institutional machinery than the press cycle implies. And that's actually the more useful finding. Ordinary machinery is something other institutions — in Munich, in Hsinchu, in places without caviar dinners — can build.

General-program findings drawn from Eesley & Lee, "Do University Entrepreneurship Programs Promote Entrepreneurship?" Strategic Management Journal, 2021. CDTM findings from Weik, Fröhlich, Defort, Welpe & Eesley, "Pre-Entrepreneurship Programs and the Quality of Entrepreneurship," working paper currently under review.

Why we're betting more on teachers

Chuck Eesley — Mon, 11 May 2026 04:18:14 GMT

This past December, the Zhou & Eesley Family Foundation ran a program at the Penang Science Cluster in Malaysia. Roughly twenty-five teachers came in for training in AI literacy and design thinking. We taught for a few days. We left.

In the months that followed, those teachers — without further intervention from us — brought what they’d learned back into their classrooms, with the new curriculum integrated into their teaching. From the six teachers who responded to the post-training survey, those six alone directly reached 611 of their students. Extrapolating across the full cohort, the curriculum likely reached on the order of two thousand five hundred students.

That’s roughly 100 students reached per teacher trained. By every reasonable measure, this was the most measurably effective program our small foundation has ever run.

It is also the program that finally rebalanced how Lijie and I think about what philanthropic education work is for.

What we already knew, and weren’t acting on.

Lijie has been around teacher training for far longer than the Foundation has existed. Before her engineering career — before Silicon Valley, before us — she ran teacher-training programs in rural China and worked as a program manager at China’s Ministry of Education. She had spent years watching what good teacher PD could do at scale, and what poor teacher PD couldn’t do at any scale.

So when we co-founded the Foundation in 2021, she’d already been making the multiplier argument for a decade. I (Chuck) was the holdout — my instinct, coming from Stanford research, was to value direct contact: be in the room, see the students, capture the texture of the work in real time. We compromised by doing both, often inside the same program: our 2023 LOHADA visit in Tanzania trained roughly twenty teachers alongside the fifty-plus students we taught directly, and our work at Fulbright University Vietnam has consistently paired teacher-development sessions with the student-facing programming. But the bias tilted toward students in the room — that’s what we showed up to do, and what we celebrated. Penang is where the math finally became the headline.

What 100-to-1 looks like.

Penang is where Malaysia builds its semiconductors. Intel, AMD, Lam Research, Bosch — they all have their fingerprints on the island. It is a place that knows what high-skilled technical labor looks like and is short of it.

The Penang Science Cluster is a non-profit founded by industry leaders to build the pipeline. Their model is the right one: convene teachers from rural schools across the state, train them in disciplines the schools don’t know how to teach, and let those teachers go back and teach. We were brought in to contribute the AI and design thinking module — a small input into a much larger program PSC was already running well.

The module was one day of intensive work with the cohort of teachers, built around a curriculum we’d developed specifically for the way Malaysian middle schools actually work. (You’d be surprised how much philanthropic curriculum dies on contact with a school where the kids don’t have laptops and the teacher only has 35 minutes a week with each class.)

After the training, the teachers brought what they’d learned back to their classrooms. PSC ran a post-training evaluation, with six teachers and eleven students responding so far.

The numbers from that subset: 100% of responding teachers adopted at least one new AI tool or design thinking technique — most are using ChatGPT, with Gemini and rapid prototyping in the mix. Every responding teacher reported saving time through AI workflow integration — most one to two hours per week, the rest three to five. Of the eleven students surveyed, 73% reported discovering a career path they had not seriously considered before — software engineering, chemical engineering, product management, AI research, public policy, even chef — and the remaining 27% said the workshop gave them new specifics on a path they already liked. Teachers rated the workshop 8.7 out of 10 on average for “would you recommend a colleague”; students rated it 4.6 out of 5; and on a separate question, students rated their confidence in using AI to research and plan their futures at 4.1 out of 5. Survey responses are still coming in; these are the early numbers.

The numbers miss what’s actually interesting. A teacher, in their own words:

“Since the workshop, I have started integrating ChatGPT into my teaching workflow by using it to help design lesson plans, generate discussion questions, and create differentiated learning materials for students. It has also been useful for quickly drafting emails and administrative documents, which saves time and allows me to focus more on student engagement.”

A student:

“I have discovered to become an AI Researcher and Public Policy Researcher. What draws me to this career path is that it combines both my interests and my strengths… Before joining this workshop, I felt unsure and unclear about my future career direction. This workshop gave me the opportunity to explore my interests and skills more deeply, and helped me see how they can be applied in ways that are beneficial not only to myself, but also to society.”

Those are the easy numbers and their human counterparts. What I want to talk about is what they imply, and — equally — what they don’t.

The thing about the math.

If you spend $X to teach 50 kids directly for a week, the marginal cost per student-hour is high, the long-tail effect is approximately zero, and the work doesn’t accrete. The fiftieth kid is not measurably better off because you taught the first forty-nine.

If you spend roughly the same $X to train twenty-five teachers — who then teach roughly a hundred kids each per cohort, year over year — the cost-per-student-reached collapses, the long-tail is the dominant term, and the work compounds.

This is not a hard observation. It is something the larger foundations figured out decades ago. (The Gates Foundation’s K-12 work, when they were doing it, was almost entirely teacher-training oriented for exactly this reason.) But it is something small foundations are slow to internalize, because direct teaching has a kind of moral romance that teacher training does not.

When I tell people the Foundation reached thousands of students in Penang last year, the natural follow-up is: “Wow, were you there?” And the honest answer is: no. We trained roughly twenty-five teachers in December. Those teachers reached their students in February, in March, in April. We never met them.

That is not a failure mode. That is the point.

The honest limits.

I want to be specific about what this analysis is not claiming, because someone with serious training in education research will land most of these critiques the moment they read the numbers.

It is not claiming the 100-to-1 multiplier is generalizable. PSC is an unusually well-resourced partner: industry-funded by Intel, AMD, Lam Research, and Bosch; staffed by people who recruit teachers from rural schools across the state; equipped with rigorous evaluation infrastructure we did not have to build. The teachers we worked with were carefully selected — motivated, experienced, ready to adopt. Most teacher-training programs run with self-recruited teachers in contexts without that wraparound. The multiplier collapses.

It is not claiming teacher PD generally works. There is a long, sobering literature on teacher professional development — most one-shot training has near-zero effect on student outcomes. The mechanism that distinguishes the rare PD programs that do work, and Penang among them, is the wraparound: rigorous teacher selection, multi-day immersion, curriculum specific enough to use Monday morning, embedded measurement, and follow-up that turns one-shot training into multi-year relationship. Teacher training without that wraparound is direct teaching in a different costume — fewer kids reached, same lack of compounding.

It is not claiming we did all the work. PSC did the teacher recruitment, the venue, the school relationships, the follow-up coaching, and the measurement that produced every number I’m citing. We provided the AI and design thinking module. The multiplier belongs as much to them as to us. Probably more.

And it is not claiming we measured what actually matters most. A “would you recommend this” rating measures satisfaction; “discovered new career paths” measures self-reported insight; “adopted at least one new technique” measures observable behavior. None are what a serious education researcher would call a learning outcome. A six-month follow-up showing students retained AI concepts they didn’t have before, or that the trained teachers’ classes outperform peers on independent assessment — that would be rigorous measurement. We don’t have that yet. We’re building toward it for the next cohort.

What we are claiming, more modestly: with the right partner and the right wraparound, in one program in one country in one year, we produced an order-of-magnitude better cost-per-student-reached than direct teaching, and the early indicators are unusually strong. Enough to shift the bias of our 2026 programming. Not enough to recommend every small foundation drop direct teaching tomorrow.

Why we still teach directly.

We have not stopped teaching directly. We will not stop. Three reasons matter.

The first is signal acquisition. You learn things in a classroom you cannot learn from a teacher’s after-action report or a survey instrument. When Lijie taught at LOHADA in Tanzania, what she came back with was not a number — it was an intuition for which parts of our entrepreneurship curriculum survived contact with East African secondary students and which parts collapsed. That intuition then went into the teacher-training material we now run elsewhere, including the teacher cohort at LOHADA itself. Direct teaching is the R&D function of a teacher-training operation.

The second is relationship. The Foundation’s most durable partnerships — Tanzania through Chuck’s former student James Juma, Vietnam through his former student Bao Phan, MIT MEET through their leadership — came from being physically present early on. The people who later open doors for us, who invite us back, who introduce us to other partners, did so because they met us in a room with students. We will keep showing up in rooms with students.

The third is that we love it. This may sound trivial in a piece otherwise devoted to math. It isn’t. There is something about teaching a class of kids in Molokai or Kampala that we don’t get from any other part of our lives. Optimizing the Foundation entirely for measurable multiplier effects would optimize out the part of the work that gives us the energy to do the rest of it. That is a bad trade.

The rebalancing is exactly that — a rebalancing, not a replacement. Most new programming dollars now bias toward teacher-anchored models. Some fraction continues to fund direct teaching, treated explicitly as research, relationship-building, and the part of the work that keeps us human.

Stepping back.

I’ve been lucky to spend much of my career teaching: Stanford undergraduates, doctoral students, corporate executives in dozens of countries, engineers at companies that don’t usually let outsiders in. The K–12 and refugee work the Foundation does is one slice of that. It’s worth saying out loud why teaching in these settings matters — to the people in the room, and to us.

For the people in the room: most of what’s worth knowing about how innovation, entrepreneurship, and AI actually play out doesn’t make it into textbooks. The frameworks that work, the failure modes you’d never anticipate, the way institutional context changes everything — these get transmitted person-to-person, classroom by classroom. Students who never get into a room with someone who’s spent twenty years thinking about how this stuff works are not less talented. They are under-resourced. Teaching is one of the few interventions that closes that gap directly.

For us: teaching is where the research becomes legible. I leave a classroom of Korean executives, or Vietnamese engineers, or a Penang teacher cohort, with sharper questions than I came in with. The doctoral student who pushes back on a framework, the executive who tells you why your model breaks in their industry, the teacher who explains which part of the curriculum students actually struggle with — these are inputs no peer-review process produces. From a research perspective, the Foundation’s K–12 teaching, our work with refugee entrepreneurs in Uganda, and the corporate teaching and the Stanford classroom are the same activity: they are the source of the next paper, the next program, the next correction to what I thought I knew.

Teaching is not what you do once you’ve stopped learning. It is one of the better ways to keep learning. The Foundation exists, in part, because we want our work to keep being shaped by what we hear from the people we teach.

Why we work in partnership.

The honest version of “we trained twenty-five teachers” is “we contributed an AI and design thinking module to a teacher-training program built and run by the Penang Science Cluster.” We didn’t invent the multiplier model. We didn’t recruit the teachers. We didn’t build the relationships with the rural schools, or the evaluation infrastructure, or the years-deep credibility with industry partners that makes PSC the convener it is. We were the new collaborator at a table other people had set.

This is not a complaint. It is the model.

A small foundation cannot — and should not try to — replicate what a serious operator like PSC has built. Their staff have been at this far longer than we have. They have institutional relationships, regional credibility, and an evaluation discipline we are still building toward. We were inspired by what they’ve put together, learned from how they run it, and contributed the specific thing we could contribute well — AI and entrepreneurship curriculum built on Stanford research, plus the Stanford brand and connections that opened doors PSC could then walk through. Partner brings the structural thing. We bring the specific input. That is the only sensible model for a foundation our size.

This is not unique to Penang. Every Foundation program is built around a partner who knows their context better than we ever will: MIT MEET in Jerusalem, LOHADA in Tanzania (anchored by James Juma, where the work pairs teacher training with direct student teaching), Fulbright University Vietnam (the same combined model), Makerere University Business School and Challenges Uganda for the refugee entrepreneurship work in Kampala, Kaunakakai Elementaryon Molokai through teacher Kawika Gonzales, ITRI in Hsinchu, the SFSU CS department before its alumni network took over the work entirely. We bring what we have. They bring what they have. The work compounds because of the partnership, not despite it.

Without the partner, the multiplier is a hypothesis. With them, it’s the thing.

What this means if you run a small foundation.

A few practical observations from the last year:

1. Teacher training without curriculum is a waste. The teachers we worked with in Penang were motivated and ready to adopt. What they needed wasn’t inspiration. It was material — a curriculum specific enough to use on Monday morning and flexible enough to fit their actual classroom constraints. Most of the philanthropic AI-literacy material out there does not survive that test. Building good curriculum is the most underrated thing a small foundation can fund.

2. Multipliers compound only with continuity. A one-shot teacher-training program with no follow-up is direct teaching in a different costume. The foundations getting real multiplier effects are the ones building multi-year relationships with the same partners — annual cohorts, refreshed materials, alumni teachers becoming mentors for the next cohort.

3. Track the kids you don’t meet, and measure what they actually learned. If your evaluation framework only counts students you taught directly, you’ll inadvertently optimize away from teacher training. But also: if your framework only counts recommendation ratings and “adopted at least one tool,” you’ll mistake satisfaction for learning. Build both behavior and outcome metrics into the partnership upfront.

4. Direct teaching is your R&D budget; account for it that way. If you cut direct teaching to zero, you’ll be running a teacher-training operation with stale material and no relationship pipeline within three years. Treat your direct-teaching work explicitly as research and relationship-building, fund it accordingly, and don’t apologize for the lower headline numbers.

5. Find your operating partners and let them lead. A small foundation that tries to be the operator will underperform a small foundation that finds excellent operators and contributes the specific input those operators don’t already have. We did not build PSC. We brought a curriculum module to a model they had spent years developing. If you are sizing up a new program and you’d be the operator, be honest about whether you should be — and if there’s a serious operator already in that geography, ask whether you’d do more good as their collaborator than as their parallel.

Where this leaves us.

The Penang program will run again this year, with more teachers, stronger curriculum, deeper follow-on, and outcome-based measurement we should have built in the first time. We are looking at similar teacher-anchored, partner-led models for our work in rural Hsinchu and at Kaunakakai Elementary on Molokai. We continue to do direct teaching — usually paired with teacher-training cohorts in the same program — at LOHADA in Tanzania, at Fulbright University Vietnam, with refugee entrepreneurs in Uganda, and with university students wherever the cohort itself is the multiplier. Both because it works for those audiences, and because it keeps the rest of the operation honest.

There is a version of this story that ends with: “and so we figured it out, and now we know how to do philanthropy at scale.” That is not the story I am telling. We figured out one thing about one program in one place, in close collaboration with a partner who’d been figuring it out for years before we showed up. We have eight programs in eight places, and the model that worked in Penang may or may not transfer to a refugee settlement in Uganda or a rural school on Molokai. We are betting that a version of it will.

Small foundations like ours often spend our first few years confused about whether we are a teaching organization, a funding organization, or an operating organization. We are not, mostly, any of those alone. We are an organization that finds excellent partners, equips them with what they need that they don’t already have, contributes what we can do well, and keeps a small, deliberate, joyful slice of the direct work for ourselves — for reasons that have less to do with measurable impact than with knowing what the work actually feels like.

Twenty-five teachers reached on the order of two thousand five hundred kids without us. They reached them through a program PSC built. With curriculum we contributed. That partnership is not despite our smallness. That partnership is what makes our smallness work.

— Chuck

ARR Is Not the Problem. The Institutional Vacuum Around It Is.

Chuck Eesley — Sun, 03 May 2026 19:27:38 GMT

Last month, Cluely co-founder Roy Lee admitted on X that the $7 million in annual recurring revenue he had given a TechCrunch reporter was, in his own words, “BS.” The actual figure was $5.2 million — a 35% gap. The confession lit up financial Twitter for a week, anchored a Bloomberg piece by Annie Bang asking whether ARR has become “the least-trusted metric of the AI era,” and prompted the usual round of think-pieces about founder ethics.

I was quoted in that Bloomberg piece, and the framing I gave Annie — that the startup world is “a bit more of a Wild West,” with no audit requirements and no cop on the beat — has been the part most readers shared. I want to use this post to say what I didn’t have room to say in 200 words of quoted speech: this is not a story about one founder, and it is not, in any deep sense, a story about ARR. It is a story about what happens when an ecosystem builds an investment thesis around a metric with no agreed-upon definition, no enforcement mechanism, and no countervailing institution incentivized to police it.

That’s a story economists and organizational scholars actually have tools for. And the policy implications are not the ones most commentators have been reaching for.

Three structural reasons ARR is decoupling from real revenue

The naive ARR calculation is one month of subscription revenue × 12. It works when three conditions hold: subscription pricing is the dominant model, customer retention is high enough that next month resembles this month, and contract structure is reasonably uniform across customers. SaaS in roughly 2010–2020 met all three. AI in 2024–2026 meets none of them.

First, AI customers experiment. Enterprise budgets right now have unusually large discretionary lines for “AI exploration” — every CIO has been told by their board to have an AI strategy. That money flows into trials. A trial signed in March counts as ARR in March. The customer’s actual decision — does this tool earn its seat at renewal? — happens in September. By then, ARR has already been booked, reported to investors, and used to justify a markup at the next round. Net revenue retention numbers, if they were available, would tell a different story; they generally aren’t, because most AI startups are too young to have meaningful 12-month cohorts yet.

Second, pricing has shifted. A growing share of AI revenue is usage-based — tokens consumed, calls made, seats actively engaged. Darren Yee at NYU made the point well in the Bloomberg piece: you cannot take one month of subscription and multiply by twelve when most of the bill is usage. The lumpiness is structural, not transient. Companies layer nominal subscriptions on top of usage-based billing and report the combined number as ARR, but the usage portion behaves nothing like a recurring annuity.

Third, front-loading. A 12-month prepaid contract signed today can be reported as $X of ARR on day one, even though the customer has 11 months left to decide whether to renew. The accounting is technically defensible. The economic substance — the real-world claim about revenue stability — is materially weaker than the number suggests.

Put these three together, and the same nominal ARR figure can describe radically different underlying businesses. That’s the ambiguity Roy Lee exploited — clumsily, with a 35% lie that was easy to falsify. The more durable problem is the founders who don’t lie at all, who pick the most flattering legitimate definition each time, and whose numbers nonetheless overstate true recurring economics by 20–40%.

Why VC due diligence doesn’t close the gap

The standard answer — and the one I gave in the Bloomberg piece — is that VC and acquirer due diligence is supposed to be the cop on the beat. In principle, that is right. In practice, the incentives don’t align as cleanly as the model assumes.

Will Gornall and Ilya Strebulaev’s Squaring Venture Capital Valuations with Reality (Journal of Financial Economics, 2020) showed that unicorn valuations are overstated by an average of about 48% once preferred share terms are properly priced. The mechanism is what matters: VCs and founders both benefit from the headline number, and the LPs who would in principle care are not at the diligence table. ARR has the same structure. A VC who marks her portfolio to ARR, raises her next fund partly on those marks, and competes for allocation in the next hot round has limited incentive to demand that founders disclose cohort-level retention. The founder doesn’t want to. The other VCs in the round don’t want to. The LP — the only party with skin in the game on the truth of the number — sees the marks and not the underlying.

This is a classic institutional-design problem. A metric is informative only if some actor in the system has both the ability and the incentive to verify it. In public markets, that role is played by auditors, the SEC, short sellers, and enforcement actions. As an independent director and Remuneration Committee chair on a Hong Kong–listed public company, I see what that machinery looks like up close — quarterly review cycles, named auditor liability, regulator inquiries that come on a predictable cadence. In private markets, the equivalent infrastructure has never been built, because for most of the venture industry’s history it didn’t need to be: funds were small, LPs were sophisticated, capital was patient. None of those conditions still hold.

This is partly an American problem

It is worth pausing to note that the convention I have been describing is largely an American one. I co-direct the Stanford Technology Ventures Program (STVP) for international entrepreneurship, and through STVP’s global programs we run field research and teaching across six continents. From that vantage point, the parochial nature of “ARR as universal yardstick” is hard to miss.

European venture markets, with more conservative LP bases and a stronger founder accounting culture, tend to push cohort-level disclosure into the diligence process earlier. Singapore family offices — which have grown into a meaningful share of the global LP pool over the past decade — increasingly include net retention reporting in fund-level terms. Chinese AI startups face the opposite pressure: their domestic disclosure regime is tightening through STAR Market and HKEX scrutiny even as Western VCs grow more permissive about ARR ambiguity. Israeli founders, who typically raise from US funds, end up triangulating between conventions, and Indian founders increasingly do the same.

None of these ecosystems has solved the problem. But the “Wild West” framing applies most squarely to American venture finance in 2026, and reform may well come from outside it. The work I have done with collaborators on how institutional environments and industrial policy shape entrepreneurial outcomes — particularly comparing the US and Chinese ecosystems — keeps returning to the same lesson: convention is local, capital is global, and when those two collide the convention usually moves first. If the largest non-US LPs continue to formalize cohort retention as a reporting term, US GPs will follow.

The case against the obvious fix

The obvious response is “audit them” — extend GAAP-style requirements down into seed and Series A. I don’t think that’s right, and I told Annie so for the piece. The cost of imposing audit machinery on a 12-person company is real. It would push out exactly the kind of high-variance experimentation that produces the small number of category-defining outcomes that matter. Work I did with Bill Miller estimating the economic impact of Stanford alumni–founded companies puts the annual revenue from that single university’s graduates on a scale comparable to the GDP of a top-ten global economy, and STVP’s global programs have reached over 200,000 students with that same entrepreneurial training across six continents. Most of the value comes from a thin tail. Choking off the experimentation at the base of the funnel to police a metric problem at the top is the wrong trade.

What would actually work is lighter and more institutional in character.

Cohort retention norms. The single highest-leverage move is for the largest LPs — public pension funds, sovereigns, university endowments — to begin asking, as a condition of allocation, that their GPs disclose net revenue retention by cohort for their portfolio companies. The mechanic is straightforward: take all customers who signed up in a given month, track what that same group is paying twelve months later, and report the ratio. Best-in-class SaaS lands at 120%+; healthy is 100–115%; below 90% means a leaky bucket regardless of what the headline ARR says. Public SaaS companies routinely disclose this on earnings calls because investors demand it. Private companies don’t, because their LPs have not yet demanded it of GPs and GPs have not yet demanded it of founders. The metric is well-defined, the data already exists in every Stripe and billing system, and it cuts directly through each of the three structural problems above. No regulation required. The change in equilibrium would happen in a quarter.

Acquirer playbook updates. The corp dev teams at the strategics doing AI acquisitions should standardize on a “true ARR” calculation that strips out trials, prorates front-loaded contracts, and discounts the usage portion. Several already do. Publishing the playbook would normalize it.

Disclosure-not-audit. Chris Sloan’s line in the Bloomberg piece — always err on the side of disclosing too much rather than too little — is the right ethical norm and is also, in expectation, the right strategic norm. Founders who disclose more get the benefit of the doubt the next time something looks off. Founders who disclose only the favorable number get re-priced harshly when the market turns, which it eventually will.

Why the ethics framing is necessary but not sufficient

Founder ethics matters, and it runs as a serious thread through STVP’s programming — from the Entrepreneurial Thought Leaders (ETL) speaker series, where founders regularly walk through the hard calls they got wrong, to the Xfund Ethics Fellows Program, the student-led cohort program built specifically around developing the personal principles entrepreneurs will lean on when the pressure to overstate is greatest. The Cluely confession will almost certainly show up as a teaching case in the next iteration of MS&E 272, the global entrepreneurship course I co-teach with Vimbayi Kajese. Students need to wrestle with these moments early, before they’re sitting in the chair Roy Lee was sitting in.

But individual ethics is the wrong layer at which to expect this problem to resolve at the system level. Even fully ethical founders pick the most flattering legitimate definition each time; the question is whether the institutions around them — VCs, LPs, acquirers, journalists, faculty — reward or penalize that picking. That is an institutional question, not a character question. We can teach character all day, and we should. We will not teach our way out of a measurement convention that every party with a seat at the table is incentivized to leave ambiguous.

The right concept is earnings quality

A sharper way to put all of this — credit to Ben Hallen, who pointed this out after the first version of this essay went up — is that what private markets are missing is a concept of earnings quality for ARR.

Earnings quality is a well-established idea in financial accounting. Two companies can report identical earnings under GAAP and have those earnings mean radically different things in terms of how durable they are, how much they reflect underlying economic activity versus accounting choices, and how confidently an investor should extrapolate from them. Public-market analysts spend a lot of time asking about earnings quality. They look at accruals, at deferred revenue, at one-time items, at the relationship between reported earnings and operating cash flow. The headline number is the start of the conversation, not the end.

ARR has no such concept attached to it. Two AI startups can report the same $5 million ARR and have wildly different ARR quality. One cohort signed annual contracts after a six-month sales cycle and will retain at 95% next year. Another cohort signed three-month trials in the last quarter, with 60% likely to churn at first renewal. Same nominal number, different earnings quality, different actual business.

What is striking, as Ben pointed out in the comment that prompted this section, is that quality of earnings analysis is already common practice in another part of the deal economy: when individuals buy small businesses. The standard small-business acquisition playbook involves a “quality of earnings” review — a financial professional digs into the underlying economics, separates durable revenue from one-time effects, and tests whether the seller’s reported numbers actually describe what the buyer is buying. The buyer pays a few thousand dollars for the analysis and treats it as table stakes. That a Main Street acquirer of a $2 million HVAC business gets a more rigorous earnings-quality review than a venture investor putting $20 million into a $5 million ARR AI startup tells you something specific about the institutional design of private markets at the higher end.

The most sophisticated venture investors and acquirers do, in practice, surface ARR quality in diligence — they ask for cohort retention data, they probe the contract structure, they discount usage-based revenue. The question is why this practice has not become standard, and why the headline ARR number continues to set the terms of debate. The answer, again, is institutional. The Main Street buyer of an HVAC company has every incentive to know what they are buying because the wrong answer ruins their year. The venture investor marking a portfolio to ARR has weaker incentives — the headline number gets them the markup, the markup gets them the next fund, and the truth of the underlying earnings quality only matters if and when the position is realized, often years later.

Cohort retention is the metric that surfaces ARR quality. So is the share of revenue that is usage-based versus subscription. So is the percentage of contracts that are prepaid annually versus monthly. None of these are exotic — they are routine in public-market disclosure for SaaS companies and they are routine in small-business acquisition diligence. They are missing from venture-stage practice almost entirely. The fix is not a new metric. It is the application of an old discipline to a new asset class.

This reframing also clarifies why the lighter-touch interventions I described above are likely to work. Cohort retention disclosure is exactly the kind of additional context that allows sophisticated investors to assess earnings quality without imposing audit overhead. It is the venture-stage equivalent of asking a public company to break out recurring versus one-time revenue. The information is cheap to produce, hard to game once standardized, and dramatically improves the signal-to-noise ratio of the headline number.

A larger point about metrics and ecosystems

Step back from ARR specifically. The deeper pattern is that entrepreneurial ecosystems develop measurement conventions during a period of relative stability, those conventions get embedded in deal terms, fund marks, press coverage, and recruiting pitches, and then the underlying business changes and the convention drifts from the thing it was meant to measure. The convention persists because too many actors are now invested in it.

This is not unique to ARR. It happened with daily active users in social media, gameable through engagement-loop design. It happened with gross merchandise value in e-commerce, gameable through subsidized transactions. It happened with monthly recurring revenue in early SaaS, gameable through one-time fees disguised as subscriptions. Each cycle, the ecosystem eventually develops a sharper metric — net revenue retention, contribution margin, organic DAU — usually after a public blowup forces the issue.

ARR is in the early innings of that correction. Cluely is the public blowup. The next 18 months will show whether the ecosystem develops the disclosure norms that would let ARR remain useful, or whether the metric becomes so degraded that sophisticated investors quietly stop using it and a new one takes its place.

Either outcome is fine. The one to avoid is the middle path — everyone keeps reporting ARR, everyone privately knows it’s unreliable, and the gap between the number and reality keeps widening until the next downturn forces the reckoning all at once.

Thanks to Annie Bang at Bloomberg for the original reporting and the conversation that prompted this longer treatment, and to Marina Temkin at TechCrunch for the original Cluely reporting that started the thread. The framing I lean on here owes a great deal to Will Gornall and Ilya Strebulaev’s work on private market valuations, which remains a solid academic anchor for thinking about this class of problem.

Chuck Eesley is a Professor of Management Science & Engineering at Stanford University and co-director (for international entrepreneurship) of the Stanford Technology Ventures Program (STVP) .

The “In-Box Congestion” Crisis: Why AI Entrepreneurship Needs a Mechanism Design Overhaul

Chuck Eesley — Wed, 04 Mar 2026 02:43:34 GMT

After Gautam Ahuja’s talk on signaling theory, a conversation with Itai Ashlagi, and Tom Mitchell’s presentation on AI history at the Stanford Digital Economy Lab, something crystallized: we are teaching the next generation of founders exactly the wrong lesson.

Right now, entrepreneurship education teaches AI as a Generator:

❌ Generate a slide deck.

❌ Generate a business model.

❌ Generate 1,000 “bespoke” cold DMs.

The result? Total market congestion. When the marginal cost of personalized outreach drops to zero, the value of that outreach drops to zero. We’ve turned the venture ecosystem into a high-speed noise machine.

But the deeper problem isn’t spam. It’s structural.

Steve Blank’s great contribution was replacing “here’s my plan” with “get out of the building.” Lean Startup methods moved founders from storytelling to customer discovery — from assertion to evidence. That was the right shift for its era.

But hypothesis-testing frameworks have always had a foundational weak point baked in: they rely on founders to honestly convey what they found. In game theory, this is called cheap talk — assertions that are costless to make, impossible to verify, and systematically biased toward the result the speaker wants to be true. A founder does 15 customer interviews, gets ambiguous signals, and reports “strong early validation.” No fraud. Just the entirely human tendency to weight confirming evidence more heavily than disconfirming evidence.

AI doesn’t introduce that problem. It industrializes it.

The synthesis is cleaner. The narrative more coherent. The gap between what customers actually said and what the deck concludes they meant has never been easier to paper over — without any intent to deceive. Agentic AI turns motivated reasoning into a polished deliverable.

Spence’s insight from signaling theory cuts right to it: a signal is only credible if it is costly to fake. Cheap talk, by definition, fails this test. And right now, almost everything we’re teaching founders to produce — the pitch, the persona, the discovery summary, the MVP demo — has become cheap talk. Not because founders are dishonest, but because the mechanism was always under-designed, and AI has exposed the flaw at scale.

To be clear, this isn’t an argument against structured frameworks.

Bill Aulet’s Disciplined Entrepreneurship and MIT’s Orbit/JetPack tool represent exactly the right instinct — grounding AI in a rigorous, proven process rather than letting it run loose. JetPack accelerates founders through 24 steps of structured analysis in hours instead of weeks. That matters.

But there’s a warning that cuts to the heart of it: with AI, it’s never been so fast to run in the wrong direction. Acceleration is not verification. The next evolution isn’t faster generation of better outputs — it’s a different question entirely: how do we know the outputs are true?

The progression looks like this:

∙ Blank: Get out of the building (replace assertion with evidence)

∙ Aulet/JetPack: Move through the evidence-gathering faster (structured AI-accelerated generation)

∙ The next step: Make the evidence harder to manufacture (AI as verifier, not generator)

Each era inherits the previous one’s tools and exposes their blind spot. Lean Startup exposed the business plan. JetPack exposed the unstructured process. The mechanism design overhaul exposes the cheap talk embedded in both.

So what do we actually teach instead?

The answer isn’t to abandon hypothesis testing. It’s to close the loop that Lean Startup left open — the verification loop. We should be teaching founders four things:

1. Costly Signal Design.

Not every signal needs to be expensive — but the signals that matter most need to be hard to fake. This means teaching founders to design their validation process around evidence that carries real costs: a Letter of Intent that required a legal signature, a pilot that required a customer to reallocate budget, a co-development agreement that required someone to show up. These are signals that carry weight precisely because they required something from the other party, not just from the founder.

2. Separation of Synthesis from Evidence.

Founders should present raw customer data — recordings, verbatim quotes, decision logs — separately from their interpretations of it. AI can be genuinely useful here, not as a synthesizer that smooths over contradictions, but as an auditor that surfaces them: “Three of your fifteen customers said the opposite of your headline finding. Here they are.” The tool serves the verification function, not the narrative function.

3. Adversarial Simulation Before Real-World Exposure.

Before a founder runs a single customer interview, AI can stress-test their assumptions — not by generating favorable personas, but by playing the skeptic. A well-designed simulation steelmans every reason a customer wouldn’t buy, a competitor would win, or the unit economics wouldn’t hold. The founder who has survived 50 adversarial AI interviews arrives at their first real customer conversation with sharper hypotheses and a much higher signal-to-noise ratio in what they’re listening for. The output isn’t a polished narrative. It’s a set of refined, falsifiable bets.

4. Mechanism Design Thinking.

The most underrated skill we can teach founders isn’t prompting — it’s system design. Who has an incentive to tell you the truth, and under what conditions? What would a customer have to give up to signal genuine intent versus polite interest? How do you structure an interaction so that a “yes” means something? These are mechanism design questions, and they belong in every entrepreneurship curriculum alongside customer discovery and financial modeling.

Mitchell observed that technical forces eventually outpace social ones. The technical force of 2026 is Agentic AI. The social challenge is Trust. And trust, at its core, is a mechanism design problem — not a content generation problem.

We don’t need more founders who can generate a compelling narrative. We need founders who can build systems that make the truth easier to tell than to obscure.

The future of entrepreneurship isn’t about being the loudest. It’s about being the most verifiable.

#AI #Entrepreneurship #MechanismDesign #SignalingTheory #LeanStartup #DisciplinedEntrepreneurship #VentureCapital

The Role of Institutional Trust in Shaping Entrepreneurial Intent

Chuck Eesley — Sun, 16 Feb 2025 22:01:16 GMT

Institutional trust plays a crucial role in shaping economic and entrepreneurial outcomes, yet its effects are often overlooked in discussions about startup ecosystems and policy interventions. Our research (Eesley & Lee, 2023) highlights how institutional trust influences not only firm formation but also long-term venture success. By examining large-sc…