
Brian Stauffer

Brian Stauffer

Brian Stauffer

Brian Stauffer

Brian Stauffer

Brian Stauffer
Cast your mind back to Thanksgiving dinner of 2022, and imagine one of the guests asking the table when an AI you could talk to would get to a million users. The guesses might have covered the usual range—widely adopted AI might be two years away, or ten, or twenty—but none of those answers would have been anywhere near right because, that Thanksgiving, the right answer for million-user AI was “a week from Monday.”
OpenAI dropped ChatGPT 3.5 on the last day of November; by December 5 it had a million users, going on to a hundred million in 2023 and closing in on a billion now. The neural network, the technology underlying large language models like ChatGPT, had been around for decades. OpenAI had been founded in 2015, and the research paper that introduced the current generation of such networks, called transformers, appeared in 2017. (GPT is a generalized pre-trained transformer, trained in advance to answer a wide range of questions.) Given the long lead time, why didn’t AI use grow gradually, instead of exploding in late 2022?
Surprisingly, the increase in use wasn’t tied to an increase in quality. Versions of OpenAI’s GPT3 had been around for months with no real uptake. The big change in late 2022 came from the Chat part, not the GPT part. The addition of the chat interface allowed users to bring a lifetime of conversational habits to bear. Instead of explaining what a large language model was, user orientation for ChatGPT became “Just talk to it.” It is almost impossible to resist treating software that talks as something that also thinks. This tendency lulls users into granting the tools an unearned sense of authority, consistency, and even care. If you talk with an AI like you text with your friend, it tends to make AI seem like your friend.
The training of large language models involves considerable human feedback about which sorts of answers we prefer. This “reinforcement learning through human feedback” embeds human preferences deep into the models. It is a way to make the answers feel more responsive and helpful, but unfortunately humans prefer confident and flattering answers over mere accuracy.
All of this is just to say that the arrival of AI as an automated tool rewarding humanlike interaction is a cultural challenge for universities even more than it is a technological one. This is a matter of special urgency to me, as I am the person most particularly tasked at New York University with helping faculty and students adapt to new digital technologies in the classroom.
Getting chatted up
The choice to make chat the interface through which users interacted with ChatGPT was a fateful one, because of all the expectations that come along with our experience of conversation. When we talk with another person, we extrapolate a wide range of characteristics and capabilities based on the fact that the only other entity we know of that can talk is another person. If a child can speak in complete sentences, they can also tell you about their extended family. If an adult uses the word “cohort,” they probably have a college degree, and so on. These inferences work well for people, because we have such deep and rich experience of inferring human context from a few clues. However, those same inferences work quite badly for AI, because we overestimate the range and depth of skills an AI must have, based on some of those same clues.
This is the source of the breathless confusion between “An AI can pass the LSAT!” and “An AI can be a lawyer.” I asked one of my lawyer colleagues about the LSAT story; he replied, “I cannot convey to you how little of my day is like answering a question on the LSAT.” The LSAT, like most written tests, is a proxy for a whole bundle of human capabilities; when a person does well on the LSAT, you can also infer things about their ability to persevere at difficult work, or to manage stress. When a large language model does well on the LSAT, you can infer only that it can do well on the LSAT. This is a particular problem for higher education, because of our special requirements around student effort, not just student output.
The hard work of learning
When a new technology is powerful enough, humans can end up giving over some of our autonomy. In one striking example early in the spread of smartphones, a woman walking through Park City, Utah, was using newly available walking directions on her phone. Offered what looked like a shortcut, she took a route that directed her across a four-lane highway with no shoulder, crosswalk, or traffic lights. While crossing, she was struck by a car and badly injured. She sued the provider of the directions; the court ruled against her, on the grounds that providers of directions could not have anticipated anyone would cross a four-lane highway without looking for cars.
Yet she did, on the advice of her phone. Although that incident was extreme, it was not unique in the annals of GPS use. One person drove into a swamp; another into a river, another still into a lake. (All survived.) This phenomenon goes by several names, including automation-induced complacency, the tendency of a user to underestimate a system’s potential for error. It takes a lot of complacency to drive into a lake, but it happens.
Asking what sort of AI-induced complacency might come for our students means asking what human activity it saves us from. One answer is clarity—the need to be precise and detailed about creating something. Do you need to produce an essay, charts or graphs, a presentation? Making those things used to require an iterative, detail-oriented approach—here are the things the essay should include, but in what order? With how much detail? Should the tone be exploratory or assured? Now you can just tell an AI to “describe the effects of new technologies on the Crimean War” (one of my test prompts). The machine will write a competent essay faster than a human could write one. The machine will write a competent essay faster than a human could type one.
For many professions, spinning up a piece of writing from a few notes and requests is a blessing. A psychiatrist friend (also Morse ’86) reports using it to write letters to insurance companies after uploading her case notes. But in higher education, the effect of AI is far more mixed, in part because for us, asking students to write is a way of requiring them to think.
If you observe a factory and see that the output is tires, you can then assume that the worker’s effort is there to create the tires. However, if you take that intuition to a history class and conclude that the output is history papers and that the students’ effort is there to create the papers, you have not just misunderstood the situation, you have it backwards. The important output of that classroom is not history papers, it’s historians; the papers are there to create student effort, not vice versa.
Inescapable AI
This would not be such a big issue if adoption of AI had not been so fast and complete. A recent 16-country survey by the Digital Education Council found that five out of six students are using AI in their coursework. The tools have become so widely adopted so quickly that there is a generation gap already forming, with students quickly achieving a facility with generative AI that is not yet matched by faculty updates of curriculum or assessment strategies.
Yale’s Poorvu Center for Teaching and Learning, created in 2014, had become so widely trusted among faculty by the time ChatGPT appeared that the center began getting calls for advice and help from faculty that December, just days after OpenAI dropped ChatGPT 3.5. Though Yale faces the same strains of rapid AI diffusion that all colleges and universities in the US do, the center’s connection to both faculty and students has allowed it to tackle the generation gap by reaching out to both groups.
Jennifer Frederick ’99PhD, Poorvu’s executive director and the associate provost for academic initiatives, says student use of AI is sometimes driven by time pressures around extracurricular activities, which risks creating “a toxic situation where students using AI to buy themselves time create peer pressure for other students to do the same.” The best place to reduce the risk of lazy adoption is in the classroom, both because faculty in different disciplines will have different approaches to AI, and because they are learning and communicating goals for their students.
Getting faculty up to speed on what AI can do, and when it is and isn’t appropriate, is a precondition for deeper work on discipline-specific approaches to AI and curriculum, and yet faculty have typically been hired based on disciplinary expertise well outside the domain of AI use; while some faculty are experimenting with AI in the classroom and elsewhere, experimentation is not universal. Frederick notes that “if faculty are not self-starters with technology, the door they should go through to learn about AI is not obvious.”
This creates a two-track problem common to most institutions of higher education at the moment: Both faculty and students need to become familiar with the tools, but while students often need to limit their use, faculty often need to expand theirs, even if just to gain enough of an understanding to formulate an AI policy. Frederick sees the issue as one of critical thinking about the tools themselves. “Students are using AI all over the place, and they benefit from guidance about how to consider or use AI in discipline-specific settings. To provide that guidance, faculty need a foundational familiarity with the tools.” This two-track problem is made more complex by disciplinary breadth—faculty and students in computer science, comparative lit, and chemistry are going to use the tools differently.
The common approach to disciplinary variability, at Yale, at NYU, and at most universities, is to defer downward to the departments, and within departments, to individual faculty, to make judgment calls about what uses of AI are and aren’t in line with expectations. All of this is complicated by the relative inability to either prevent or perfectly detect use of such tools.
There are a number of common approaches to this problem: guidelines for students about when to use AI and when not to; guidelines for faculty about how to talk with students about AI use in the classroom; sample syllabus statements for faculty to modify and use; training for both groups. But even that relatively broad set of adaptive options is only a partial response to a change this enormous, in which assessing student output is no longer a good proxy for measuring student insight.
In the interest of helping students and faculty, Poorvu is asking them to help each other. The center has hired a number of “student AI liaisons” (SAIL), who work with faculty to help them do things like revise exams or design assignments that emphasize learning and thoughtfully integrate or avoid AI use. Frederick says that with the SAIL program, “the students’ expertise with AI is less important than the fact that students and faculty are trying something hands-on together.” This intergenerational approach to adapting to AI takes advantage of the fact that conversation, the exchange of ideas, between students and faculty is a core function of higher education.
Learning to fail
On one hand, the integration of AI into the academy has been dizzyingly fast. I have been studying the dynamics of technology adoption and cultural change for decades, and I have never seen anything get adopted this quickly. This has led to a deepening of the generation gap, not just between students and faculty, but between current students and even recent graduates. Many of the undergraduates in my institution will flatly state that they could not get their work done without AI, even though the NYU classes of 1836 through 2022 somehow managed.
On the other hand, integration of AI into university life has been relatively slow. Because these tools change the relationship between the ability to produce writing and the requirement to engage in the hard, messy work of thinking through something, changes as small as parts of individual assignments and as large as whole curricula are going to have to be adapted, and right now, the tools are not even in a stable enough state to be confident that any adaptation will hold. And so we get some uses that are obvious and quick wins, and other uses that are making us rethink the nature of education.
The advantage of AI to people like my psychiatrist friend is precisely that it reduces the amount of thought required per unit of output, but that same effect is corrosive to learning. Students are already regarding ChatGPT as an authority, something it took the far more reliable Wikipedia over a decade to achieve. And the constant stream of flattery makes this worse—a recent thousand-user study with flattering and non-flattering AIs concluded that “sycophantic AI models have real capacity to distort people’s perceptions of themselves.”
Giving anyone a conversational partner that is, as another study concluded, 50 percent more flattering than humans, is uncharted territory, but early research suggests that the motivation we most have to worry about with our students is not laziness but anxiety about their work and the tendency to be relieved by tools that offer not just acceptable output, but unearned reassurance along the way.