Levelling the playing field
“Looking at these results, I do not think enough people are considering what it means when a technology raises all workers to the top tiers of performance,” Ethan Mollick, one of the report’s co-authors and an associate professor of management at the Wharton School, wrote in a separate blog post.
“It may be like how it used to matter whether miners were good or bad at digging through rock … until the steam shovel was invented and now differences in digging ability do not matter any more.
“AI is not quite at that level of change, but skill levelling is going to have a big impact.”
The study, which also included contributions from researchers at Boston Consulting Group, Warwick Business School and MIT Sloan School of Management, separated the consultants into three groups and asked them to complete two experimental tasks.
The first group worked without AI, the second were given access to ChatGPT-4, and the third were given access to ChatGPT-4 as well as instructional videos and documents on how to use it effectively.
Success within the ‘AI frontier’
The first experiment asked participants to conceptualise a footwear idea for niche markets and delineate “every step involved, from prototype description to market segmentation to entering the market”.
Participants were required to complete 18 tasks, or as many as they could within the given time frame, across four broad domains: creativity (e.g. “propose at least 10 ideas for a new shoe targeting an underserved market or sport”); analytical thinking (e.g. “segment the footwear industry market based on users”); writing proficiency (e.g. “draft a press release marketing copy for your product”); and persuasiveness (e.g. “pen an inspirational memo to employees detailing why your product would outshine competitors”).
An executive from a global footwear company confirmed the tasks covered all the steps it would usually take when going from product ideation to launch. And Dr Dell’Acqua said the use of “AI proved to be highly beneficial in enhancing performance across all four domains”, with the AI-assisted consultants completing 12.2 per cent more sub-tasks, on average, and producing work of a 40 per cent higher quality, as assessed by human graders, than those in the control group.
The results of the second experiment, however, suggest that relying too heavily on AI can backfire.
The researchers designed the second experiment so it was beyond the capabilities of ChatGPT-4, or outside its “frontier” – an invisible wall that Associate Professor Mollick said was difficult for users to determine and could only be located through regular experimentation.
‘Falling asleep at the wheel’
For this task, participants had to use interviews with company insiders and financial data from a spreadsheet to pinpoint which of a hypothetical company’s brands held the most potential for growth.
The participants’ responses were either deemed correct or incorrect: only one brand held the most potential for growth. The report said: “Subjects in the control group were correct about this exercise about 84.5 per cent of the time, while the AI conditions scored at 60 per cent and 70 per cent (for an average decrease of 19 percentage points when combining the AI treatment conditions and comparing them to the control condition).”
“At the same time, their completion time for the task was reduced by over 20 per cent,” Dr Dell’Acqua said. “So, interestingly, for tasks outside the AI frontier, consultants leveraging AI were quicker but sacrificed accuracy in the process.”
Dr Dell’Acqua said the results of the second experiment highlight the danger of professionals relying too heavily on AI and “potentially sidelining their judgement” on tasks that sit beyond the tool’s capability frontier.
“Our findings with consultants working on tasks outside the AI frontier highlight this concern,” Dr Dell’Acqua said.
“They occasionally seem to ‘fall asleep at the wheel’, perhaps because AI-generated responses are so convincing, even when they might be off the mark.“