Digital Gems

Digital Gems

#economics #research
January 2024 Updates
January 2024 Updates
New Things Under the Sun is a living literature review; as the state of the academic literature evolves, so do we. This (short) post highlights two recent updates. Subscribe now But before getting into that, a reminder: if you have completed the first year of a PhD in economics or a related field and are interested in innovation, the deadline to apply for the Institute for Progress’ (free) Economics of Ideas, Science, and Innovation Online PhD Short Course is January 9, 2024. More info here. Where Research Happens Counts In our post When research over there isn’t helpful here, Caroline Fry and I looked at a series of examples where the applicability of research findings is geographically localized. In this update, we add discussion of a Job Market Paper by Sergio Puerto (see a list of more innovation job market papers here). After discussing some papers that show policy-makers, doctors, and patients have more trust in research conducted in their home countries, the updated post continues… Lastly, Puerto et al. (2023) documents similar results for agriculture, but looking within one specific country - Costa Rica - rather than across them. Puerto is interested in plant breeding in developing countries, where in many years only a very small number of new and improved plant varieties are released. As implied by our opening to this post, in these countries domestic and international private sector R&D on new plant varieties tends to be low or absent, so that resource-constrained public sector breeding programs are often the primary source of new plant varieties. Breeding of these new varieties happens at experimental research stations. Puerto shows the location of these research stations affects the ultimate levels of adoption of new seeds by Costa Rican farmers. His main experiment randomizes 800 farmers across 118 villages into different experimental conditions. Some groups are given the opportunity to buy a recommended new bean seed variety, recently bred by the experimental research station. Puerto argues this matches a typical year, where the resource-constrained public breeders recommend a single new seed variety for a large group of farmers. But another group in Puerto’s experiment is given that same opportunity, but also asked to plant three different bean varieties on test plots on their farm, and then given the opportunity to buy whichever of those three test varieties they most prefer. The idea here is to approximate the value of much more local information on seed varieties - soil, topography, and climate vary across farmers and can mean different seeds are suited to different conditions. Puerto finds the farther a farm is away from the research station where breeding takes place, the more likely it is that farmers will diverge from the research station recommendation and choose an alternative seed. Specifically, the 25% of farms closest to the research station adopted the recommended seed at essentially the same rate as farmers who did not have an opportunity to test other seeds, while the one quarter farthest away were 26% less likely to pick the recommended seed (the average distance was about 150km). Read the whole post Home Bias in Management Research In our post Geography and what gets researched, Caroline Fry and I looked at evidence that where researchers reside affects their choice of study topic. We’ve now added a discussion of Nagaraj and Yao (2023), which presents new data on management research. The updated post now includes these new paragraphs: Nagaraj and Yao (2023) provide additional evidence on the focus of management research. Their study focuses on the complete set of articles published in six top management science journals and, among other things, they’re interested in seeing how where the authors work affects what they choose to study. They can infer where the authors work from the location of their employer; to estimate the place(s) under study, they look for the city, state, country, and nationality words in the title and abstract. By this method, about 15-25% of articles have some kind of regional focus. That lets us ask - do researchers tend to study where they are? Yes and no. In the figure below, Nagaraj and Yao (2023) focus on the 13 countries that are either among the top ten for researcher locations or research focus. On the vertical axis we have the researcher’s location; on the horizontal axis, the country under study (note articles can have more than one researcher and research topic). For each cell, they take all the authors from a given country and compute the share of their regions mentioned in their articles that go to a particular country. The darker the shading, the larger the share. If the diagonal line is darkest, that would tell us researchers are most likely to study their own countries. From Niagara and Yao (2023) We do see a dark diagonal line, consistent with researchers disproportionately studying their home country. But the most striking pattern on this chart is probably the dark vertical line on the right: everyone studies the United States! But setting aside the USA, Nagaraj and Yao’s work does find management researchers tend to be more likely to study their own countries. Read the whole post Thanks for reading! As always, if you want to chat about this post or innovation in generally, let’s grab a virtual coffee. Send me an email at matt.clancy@openphilanthropy.org and we’ll put something in the calendar.
January 2024 Updates
Teacher Influence and Innovation
Teacher Influence and Innovation
This article will be updated as the state of the academic literature evolves; you can read the latest version here. You can listen to this post above, or via most podcast apps here. Reminder: If you have finished the first year of a PhD in economics or a related field, you can apply for the (free) ten-week online Economics of Ideas, Science, and Innovation short course. Deadline to apply is January 9 2024. More details here. Here’s a striking fact: through 2022, one in two Nobel prize winners in physics, chemistry, and medicine also had a Nobel prize winner as their academic advisor.1 What accounts for this extraordinary transmission rate of scientific excellence? There’s two main possibilities. Maybe great students and great teachers seek each other out and tend to work together. Or maybe great teachers give their students resources that make them better scientists: teaching, access to networks, support, etc. Both are probably important to one degree or another. But in this post I’ll focus on an aspect of the second channel: what do we know about how innovative teachers influence their students, and their students’ subsequent innovative career? I’ll focus on two strands of literatures: roughly speaking, how teachers influence what their students are interested in and the impact of their work. Subscribe now Interesting Correlations To start, we’ll establish some correlations between the interests of students and their teachers. Borowiecki (2022) focuses on teacher to student transmission of interests among musical composers from 1450-1975; Koschnick (2023) among undergraduates and faculty at Oxford and Cambridge over 1600-1800; Azoulay, Liu, and Stuart (2017) on modern post-docs and their advisors in the life sciences. In the next section, we’ll try to go further and show that these correlations are likely to be in large part about the teacher’s influence on student interests, rather than students sorting themselves to work with teachers who share their interests. All three papers involve heroic data construction efforts. Borowiecki’s core analysis relies on data about 341 composers, where they lived, what music they wrote, and how impactful their music is (measured by either modern Spotify follows, length of their biographies in a major musical dictionary, or rankings by Charles Murray). Borowiecki also identifies 221 student-teacher connections among this group, when the one taught the other at a music conservatory. Lastly, because Borowiecki has detailed information on the musical themes of his composers, he can algorithmically assess how similar are the musical themes of any two composers. Borowiecki’s main analysis shows that composers write music with themes that are more similar to the themes of their teachers, than to other composers. This effect holds when you restrict the comparisons to other composers living in the same country and alive at the same time as the teacher. He finds this similarity persists for around 20 years, and even across generations: composers write music more similar to the teacher of their teacher than to other composers who mighthave taught their teacher but didn’t. Let’s turn to interests in science, which are studied by Koschnick (2023). Koschnick’s analysis builds on a dataset that matches students and faculty at Cambridge and Oxford (over 1600-1800) to a database of publications in England, based on names and birth and death dates (where available). He wants to use these matched publications to infer student and faculty’s interest in different areas of science (or other topics): for example, students/faculty with more publications about astronomy are probably more interested in astronomy. To do so, Koschnick trains a large language model to classify publications into topics - he’s helped here by the era’s propensity to write very long and descriptive titles of their works.2 Finally, he wants to match students to teachers, to see if being around teachers more interested in a specific area of science makes the student more likely to work on that area. For that, he relies on the college system employed by these universities. Students at these universities belong to one of dozens of colleges, where they live with their college peers and are primarily taught by faculty from their college. Since Koschnick knows which college each faculty belongs to, he knows with a high degree of certainty which faculty are teaching which students. Koschnick documents that after they graduate, students tend to publish more on scientific topics which were more common among the publications of the faculty at the college they attended. If the share of faculty publications at your college in one scientific field doubles, then the share of publications in that field written by its students rises by 1-3%. That doesn’t sound like much, but note the average college share of science in any field is tiny - only 0.6%. So doubling the share is quite easy. In fact, the variation across colleges can vary by much more than double. One standard deviation in this data is more like a 6x increase over the average. Finally, Azoulay, Liu, and Stuart (2017) build a dataset on 489 elite life scientist post-doctoral students and their 333 advisors. These post-docs are Pew or Searle Scholars, which is useful because the Pew Scholar Oral History and Archives Project provides extensive documentation on the biography of Pew scholars, which Azoulay, Liu, and Stuart will draw on in the analysis discussed in the next section. For now, suffice it to say Azoulay and coauthors show that post-docs who work with advisors that have previously held patents are more likely to seek patents of their own in the future. Birds of a Feather? These three papers establish that students appear to share interests with their teachers, whether that interest be a particular style of music, a field of science, or commercializing research. But we haven’t done anything to establish this correlation is down to teacher influence. It might just as easily be that young composers seek out teachers whose music they like, that students go to colleges strong in the subject area they are interested in, and that budding entrepreneurial scientists seek out mentors with experience commercializing their research. All three papers present evidence that these kinds of explanations are probably not the main story. To begin with, both Borowiecki and Koschnick’s papers involve students making decisions at a relatively young age, before we might imagine they have deeply developed personal preferences. In Borowiecki (2022), 75% of students begin their training at a music conservatory, with their advisor, before the age of 22. Koschnick’s paper focuses on undergraduates. Both papers also primarily take place in eras that predate the information technology revolutions, when information about potential teachers was less readily available. Borowiecki’s paper goes on to argue that, instead, undergraduates to Oxford often selected their college based on geographical affinities. For example, in his data, students from Devon and Cornwall are more likely to go to Exeter college and students from Pembroke more likely to go to Jesus college. In one analytical exercise, he shows that students are more likely to write about a given scientific topic if the faculty of the college people in their region usually go to happen to be stronger in that field, during the years the student is at uni. In that particular exercise, he doesn’t even need to know where students actually ended up going to school, just where they would be predicted to go based on where they live. For Azoulay, Liu, and Stuart’s study of postdocs and their advisors, they have access to an unusually rich source of information about the decision-making process of their subjects: the oral histories of Pew scholars. The authors read a sample of 62 such histories (each is long; 100-400 pages) to see what kinds of factors Pew scholars self report as being important in their decision of which postdoc mentor to work with. The overwhelmingly most important factor cited was the scientific topic being investigated, followed by geography (where the lab was), the advisor’s prestige in the field, and interpersonal rapport. None mentioned the commercial orientation of the advisor, or their interest in patenting. And this wasn’t simply because they were shy to talk about non-academic goals; when asked about their own patents, interviewees were apparently quite candid. Azoulay, Liu, and Stuart use this qualitative analysis to form the basis of some additional quantitative exercises. They come up with measures of scientific similarity, geographical proximity, and prestige, which they use to derive statistical models of the matching process between postdocs and mentors. They can then see if matches that are poorly explained by these stated factors seem to be unusually correlated with the decision to patent, which would be evidence that people left their true motivations - a desire to work with a scientist who patents - unstated. But they don’t really find any evidence of this. The statistics back up what the scholars say: recent graduates don’t really think about patenting when deciding who to work with for their postdocs. But if they “accidentally” end up working with an advisor with a history of patenting, they’re more likely to patent themselves, later in their career. Both Borowiecki and Koschnick also perform an exercise based on teacher composition at conservatories and colleges. In one exercise, Borowiecki looks at how similar are the musical styles of a student and teacher, as compared to teachers at the same conservatory who either left shortly before the student joined or arrived shortly after the student left. The idea here is that if students had started at conservatory at a slightly different time they might well have ended up working with this alternative teacher. Koschnick’s study exploits an even...
Teacher Influence and Innovation
Innovation Job Market Papers 2023
Innovation Job Market Papers 2023
In this special edition of What’s New Under the Sun, we have a big bundle of the titles, abstracts, and links to innovation-related PhD job market papers from 2023 that I either found or were sent to me in response to last week’s solicitation (thank you!). This is not an exhaustive list - I am sure I have missed many great papers. If you have a paper that you think belongs on this list, please send it my way following the instructions here, and I’ll add it. I enjoyed reading all these abstracts, and am excited to dig into the papers. Back to our regular programming next week! Thanks for reading What's New Under the Sun! Subscribe for free to receive new posts. Titles Index Titles are presented in random order. Do Standard Error Corrections Exacerbate Publication Bias? by Patrick Vu Machines and Superstars: Technological Change and Top Labor Incomes by Donghyun Suh I, Google: Estimating the Impact of Corporate Involvement on AI Research by Daniel Yue Returnee Inventors and Home Country Innovation by Sherry Xue Executive contracts for sustainable innovation: incentivising gains in wealth and health by Slavek Roller Measuring Knowledge Capital Risk by Pedro H. Braz Vallocci Multinational Production and Innovation in Tandem by Jin Liu Staggered Rollout for Innovation Adoption by Ricardo Fonseca Spillovers and the Direction of Innovation: An Application to the Clean Energy Transition by Eric Donald Technology Adoption, Learning by Doing, and Reallocation by T. Jake Smith The Effect of Funding Delays on the Research Workforce: Evidence From Tax Records by Wei Yang Tham, with Joseph Staudt, Elisabeth Ruth Perlman, Stephanie Cheng Batman Forever? The role of trademarks for reuse in the US comics industry by Franziska Kaiser, with Alexander Cuntz, and Christian Peukert Race and Science by Gaia Dossi When are Patents Traded and Why: A Dynamic Structural Model of Drug Development and Patent Trading by Jie Fang The Effect of Robot Assistance on Skills by Sungwoo Cho Worker Mobility, Knowledge Diffusion, and Non-Compete Contracts by Jingnan Liu Equilibrium IPR Protections, Innovation and Imitation in A Globalized World by Leo C.H. Lam Public R&D Spillovers and Productivity Growth by Arnaud Dyèvre Optimal Skill Mixing Under Technological Advancements by Elmer Zongyang Li STEMming the Gender Gap in the Applied Fields: Where are the Leaks in the Pipeline? by Shasha Wang Reluctant to Grow: The Unintended Effects of R&D Tax Credits Targeting Small Firms by Alexandre Lehoux Technological Change and Unions: An Intergenerational Conflict with Aggregate Impact by Leon Huetsch Innovation-Facilitating Networks Create Inequality by Cody Moser, with Paul Smaldino Embracing the Future or Building on the Past? Growth with New and Old Technologies by Bernardo Ribeiro Intangible Assets, Knowledge Spillover, and Markup by Yusuf Ozkara Money, Time, and Grant Design by Wei Yang Tham, with Kyle Myers The Effects of the Affordable Care Act on Pharmaceutical Prices, Demand and Innovation by Zhemin Yuan Multidimensional Skills in Inventor Teams by Hanxiao Cui Return Innovation: The Knowledge Spillovers of the British Migration to the United States by Davide M. Coluccia, with Gaia Dossi Information provision and network externalities: the impact of genomic testing on the dairy industry by Victor Funes-Leal, with Jared Hutchins Reveal or Conceal? Employer Learning in the Labor Market for Computer Scientists by Alice H. Wu Innovation and Technological Mismatch: Experimental Evidence from Improved Crop Seeds by Sergio Puerto Relying on Intermittency: Clean Energy, Storage, and Innovation in a Macro Climate Model by Claudia Gentile Intellectual Mobility Frictions by Jordan Bisset, with Dennis Verhoeven Consequences of Indian Import Penetration in the US Pharmaceutical Market by Jinhyeon Han Markups, Firm Scale, and Distorted Economic Growth by Jean-Felix Brouillette, with Mohamad Adhami, Emma Rockall Teacher-directed scientific change: The case of the English Scientific Revolution by Julius Koschnick Strategic Network Decisions and Knowledge Spillovers: Evidence from R&D Collaborations of U.S. Firms by Kippeum Lee The Market Effects of Algorithms by Lindsey Raymond Decline in Entrepreneurship: A Tale of Two Types of Entrepreneurs by Angelica Sanchez-Diaz Scale-Biased Technical Change and Inequality by Hugo Reichardt The Effect of Inventor Mobility on Network Productivity by Brit Sharoni Titles and Abstracts Do Standard Error Corrections Exacerbate Publication Bias? Patrick Vu Over the past several decades, econometrics research has devoted substantial efforts to improving the credibility of standard errors. This paper studies how such improvements interact with the selective publication process to affect the ultimate credibility of published studies. I show that adopting improved but enlarged standard errors for individual studies can lead to higher bias in the studies selected for publication. Intuitively, this is because increasing standard errors raises the bar on statistical significance, which exacerbates publication bias. Despite the possibility of higher bias, I show that the coverage of published confidence intervals unambiguously increases. I illustrate these phenomena using a newly constructed dataset on the adoption of clustered standard errors in the difference-in-differences literature between 2000 and 2009. Clustering is associated with a near doubling in the magnitude of published effect sizes. I estimate a model of the publication process and find that clustering led to large improvements in coverage but also sizable increases in bias. To examine the overall impact on evidence-based policy, I develop a model of a policymaker who uses in- formation from published studies to inform policy decisions and overestimates the precision of estimates when standard errors are unclustered. I find that clustering lowers minimax regret when policymakers exhibit sufficiently high loss aversion for mistakenly implementing an ineffective or harmful policy. Link Machines and Superstars: Technological Change and Top Labor Incomes Donghyun Suh I construct a model of production hierarchies in which agents and machines differ in skill levels. The skill level of an agent determines the difficulty of work tasks she can perform. Relatively low-skill agents become workers and high-skill agents become managers who help workers perform difficult tasks. Machines can either augment or substitute workers. Two main findings emerge: First, whether machines augment or substitute workers depends on the highest skill level of machines. If machines can perform sufficiently difficult tasks, then machines substitute workers and augment managers. However, if machines can only perform relatively easy tasks, then machines augment workers. Second, for sufficiently advanced machines, technological change increases income concentration at the top. This occurs as gains from technological change are greater for those with higher skills, thus benefiting the most skilled managers the most. By contrast, if machines augment workers, technological change has the opposite effect on top income shares. The paper further examines the implications of Artificial Intelligence (AI) for managerial functions. If machines can perform more difficult tasks than any worker, they substitute managers. I find that management by machines most significantly raises the wages of the least skilled workers. On the other hand, managers' wages fall, with the decline most pronounced among the least skilled managers. Therefore, while less inequality between workers and managers leads to lower top income shares, the inequality among managers increases. Link I, Google: Estimating the Impact of Corporate Involvement on AI Research Daniel Yue While corporate involvement in modern scientific research is an indisputable fact, the impact of corporate involvement on scientific progress is controversial. Corporate interests can lead to constraints that redirect research activities into applied problems in a way that benefits the company but reduces scientific impact. However, corporations also provide resources such as funding, data sets, collaborators, engineers, and technical problems that researchers may otherwise be unable to access or know about, spurring knowledge creation. This paper empirically assesses the impact of corporate involvement on scientific research by focusing on dual-affiliated artificial intelligence researchers located at the intersection of academia and industry. After controlling for the researcher's quality and topic preferences, I find that corporate involvement leads to up to a 44% increase in field-weighted citations received by a paper. I document evidence that this effect arises because the average benefit of a firm's scientific resources exceeds the cost of that firm's scientific constraints. Specifically, I show that corporate involvement significantly increases the likelihood of a breakthrough paper and that these effects are magnified by the involvement of firms with greater resources. However, corporate involvement also alters the direction of the dual-affiliate author's research to be more aligned with the firm's commercial interests. This is the first large-scale quantitative study of any field of science to demonstrate a direct positive effect of corporate involvement on science or to describe the underlying mechanism. Link Returnee Inventors and Home Country Innovation Sherry Xue I analyze the innovations produced by Chinese companies and research organizations (”receivers”) after hiring returnee inventors – Chinese inventors who returned from abroad. Following their return, receivers significantly increase patenting and the number of involved inventors in technological fields where the returnee has experience. However, the new patents receive fewer citations, especially from abroad. Additionally, there is a decrea...
Innovation Job Market Papers 2023
Two Announcements for PhD students
Two Announcements for PhD students
Dear readers, Regular programming will be back next week or so; today’s post is two quick announcements that may be of interest to readers working on or completing PhDs. #1. The Economics of Ideas, Science, and Innovation PhD Short Course This spring, the Institute for Progress is once again organizing a free online PhD short course on the Economics of Ideas, Science, and Innovation. We did this last year and it went really well. This year we have expanded our roster of fantastic lecturers: Pierre Azoulay, Janet Freilich, Ina Ganguli, Ben Jones, Chad Jones, Kyle Myers, John Van Reenen, Caleb Watney, Heidi Williams, and me. The course consists of weekly assigned reading groups, a group slack, 10 two-hour zoom lectures, and an option to attend small group meetings with an instructor. The live zoom lectures will be held from 1:30-3:30pm ET on Tuesdays starting end of January. The course is meant to cover the same kind of material that would be covered in a second-year economics PhD field course, and so our target audience for this course is students who have completed at least one year of a PhD in a related field.1 While the course is free, you’ll need to do a little bit of work to apply and signal interest. The application deadline in January 9. Learn more and apply here! #2. Share Your Innovation Job Market Paper If you are a PhD (or recent PhD) who is going on the job market this year, I would like to invite you to send me the title, abstract, and link to your job market paper, if it is related to the social science of innovation and science.2 Basically the kind of thing that would be interesting to readers of New Things Under the Sun. Next week, I’ll bundle all the responses together and send out a special post with all of the new innovation job market papers. NewThings has more than 14,000 readers now, so it’s a good way to get your work in front of a lot of readers interested in these topics! If you would like to participate, please email matt.clancy@openphilanthropy.org with the subject line “JMP post:” and then the title of your paper. In the body of the email, please include your paper title, your name (+ the names of any coauthors), an abstract, and a link to where people can read the paper. If you want to be in the JMP post, please email me your details by end of day on December 5. Lastly, please share this invitation widely to anyone you think might be interested. There is no need to be a subscriber to New Things Under the Sun to submit. Cheers, Matt Subscribe now P.S. As always, if you want to chat about this or innovation in general, let’s grab a virtual coffee. Send me an email at matt.clancy@openphilanthropy.org and we’ll put something in the calendar. 1 If this isn’t you, note that we will make slides publicly available. However, because we want the zoom meetings to be interactive discussion with students, we don’t plan on releasing recordings of them. But we are exploring ways to make this material available in other ways. 2 If you are going on the market and have innovation-related work you want to share, but it isn’t your job market paper, feel free to send it anyway.
Two Announcements for PhD students
When Research Over There Isn't Helpful Here
When Research Over There Isn't Helpful Here
This post was jointly written by me and Caroline Fry, assistant professor at the University of Hawai’i at Manoa! Learn more about my collaboration policy here. This article will be updated as the state of the academic literature evolves; you can read the latest version here. You can listen to this post above, or via most podcast apps here. According to most conventional measures of scientific output, the majority of global research takes place in a handful of countries. In the figure below, we pulled data on three measures of R&D efforts across every country in the world: number of scientific/technical articles published by researchers in a country, number of researchers engaged in R&D in a country, and R&D spending by country. We then combined that data with information on the population of every country to create the following chart, which shows the share of R&D occurring in countries with some share of the earth’s population. Based on data from Our World in Data - source file here According to this data, countries with about 12% of the world’s people produce half the world’s research. On the other side of the coin, half the world’s population resides in countries that collectively produce about 9% of scientific articles. The ratios are even more skewed if we rely on data on R&D spending or the number of researchers. Put differently, much of the world’s population lives in countries in which little research happens. Is this a problem? According to classical economic models of the “ideas production function,” ideas are universal; ideas developed in one place are applicable everywhere. If this is true, then where research takes place shouldn’t be a problem. Indeed, if research benefits from clustering, we would actually prefer to concentrate our research communities into a small number of places.1 This is probably true enough for some contexts. But there are at least two problems here. First, as has been well established in the literature on technology diffusion, there are significant frictions associated with the diffusion of knowledge over geographic distances.2 Second, and what we plan to discuss in this post, research may be less useful in countries where it did not occur – or, nearly as consequential, people may believe this to be so. In this post we’ll look at four domains - agriculture, health, the behavioral sciences, and program evaluation research - where new discoveries do not seem to have universal application across all geographies.3  Thanks for reading What's New Under the Sun! Subscribe for free to receive new posts and support my work. Different places, different problems In a previous post we discussed some evidence that researchers tend to focus on problems in their local area. To the extent that the prevalence of problems varies around the world, this could mean that the distribution of researchers influences the levels of research to solve problems in some locations (irrespective of diffusion). If the problems of places with few researchers differ from the problems of places with many, then some problems will be under-researched if researchers focus on what’s happening locally. So the first important question is: do problems vary around the world? Of course they do. We can start with Moscona and Sastry (2022), which documents that the prevalence of crops and pests varies around the world. In a first step of their analysis, they use a dataset on international crop pests and pathogens from the Centre for Agriculture and Bioscience International to map the prevalence of crop pests or pathogens around the world, documenting significant variation in where crops and their associated pests and pathogens tend to be found. From Moscona and Sastry (2022) Similarly, it is well documented that diseases also vary around the world, due to variations in animal hosts, local climates, demographics, and socioeconomic conditions (see Wilson 2017 for a review).4 For example, the 13 parasitic and bacterial infections that make up ‘neglected tropical diseases’ primarily occur in low-income countries in sub-Saharan Africa, Asia and Latin America (Hotez et al. 2007). From Hotez et al. 2007 Going beyond differences in the pest and disease burden, a well-known 2010 article by Henrich, Heine, and Norenzayan documents extensive variation in human psychology study results depending on the population under study. In particular, they emphasize that the study populations in behavioral science research are overwhelmingly drawn from Western, Educated, Industrial, Rich, Democracies – so-called “WEIRD” countries (indeed, they point to one study showing nearly 70% of subjects in top psychology journals came from the USA alone!). They show that along many important dimensions, findings that are derived from WEIRD samples do not generalize to the broader human population. To take one example, Henrich and coauthors point to a 1966 cross-cultural study about the Müller-Lyer illusion, presented below. In this study, American undergraduates were more likely to perceive line b to be longer than line a, though the two are actually equal in their length. Others, such as San foragers of the Kalahari, tended to perceive the lines to be of equal length. From Henrich, Heine, and Norenzayan (2010) A 2020 retrospective by Apicella, Norenzayan, and Henrich, which looked back on the decade since the 2010 article, found samples drawn from WEIRD countries continued to dominate major journals, even as (infrequent) studies continue to find variation across countries is important.5 Finally, economics presents another domain where results in one country may not generalize to other. For example, Vivalt (2020) assesses the extent to which results from impact evaluations of economic development interventions generalize to new contexts. To do so, the author compiles a dataset of all results across hundreds of impact evaluations covering 20 types of development programs (as an example, one type of development program is conditional cash transfers). Vivalt summarizes the variation by intervention and by intervention-outcome, using meta-analysis methods, and documents that there exists significant variation for the same intervention-outcome across contexts, and that this variation is greater than variation that exists across other types of interventions, such as medical interventions. Trust of evidence from different places So the problems related to agriculture, disease, human psychology, and economic development are not universal but vary substantially from region to region. If research done in one region is more likely to be related to the problems of that region (and we argued it is, here), then that means the substantial concentration of research means a lot of problems are receiving very little research effort. Decision-makers beliefs also matter. If people believe research done elsewhere isn’t applicable to their context, then that research is less likely to inform their decisions. That’s true even if the research actually is applicable, but people don’t believe it. And some papers indicate this potential concern is a real one. Two recent papers attempt to isolate this mechanism in the context of program evaluation evidence. Vivalt et al (2023)and Nakajima (2021) both investigate how policymakers evaluate potentially relevant research with some experiments where they surveyed policymakers on their views about different hypothetical research papers. In both of these papers, the authors provide policymakers with evidence from sets of hypothetical impact evaluations, and ask them to rank or rate which evaluations they prefer. These hypothetical evaluations vary in their methodologies (RCTs versus observational studies), results, sample size, and, importantly for this post, the location of the study. The two studies find similar results: that policymakers tend to have a preference for studies conducted in similar settings to their own country, preferably their own country (Vivalt et al 2023). Some related evidence from medical research has similar implications. Alsan et al (forthcoming) use a similar approach, a survey experiment, to assess how doctors and patients interpret the results of clinical trial data. In this study the authors provided profiles of hypothetical diabetes drugs, which included the drug’s mechanism of action and supporting clinical trials. In a supplementary experiment the authors asked respondents in the United States how much they trusted clinical trial results conducted in different countries. They found that respondents tended to be less confident about the effectiveness of a drug tested outside of the United States, and several respondents expressed concerns that the drug would not work in the same way due to biological factors, socioeconomic and environmental factors. (As an aside, geography is of course not the only factor affecting which kinds of populations are underserved by research. The primary experiment in Alsan et al. (forthcoming) is actually about whether representation of different racial groups in clinical trials influences the likelihood that physicians would recommend that drug to their patients, and whether patients would adhere to the drug regimen. The study randomized the share of Black trial subjects and average drug efficacy in trials across drug profiles. Physicians were asked to indicate their intent to prescribe the drugs, and in a separate experiment, hypertension patients were asked their interest in novel therapies to treat hypertension that had been tested in trial sites with varying shares of Black participants. They found that physicians were more likely to state an intention to prescribe drugs that had been tested on representative samples, and that this effect was driven by doctors who routinely saw Black patients. As for the patients, Black respondents were more likely to state that a drug would work for them if the trial was representative.) So another rationale fo...
When Research Over There Isn't Helpful Here
October 2023 Updates
October 2023 Updates
New Things Under the Sun is a living literature review; as the state of the academic literature evolves, so do we. This post highlights some recent updates. Subscribe now Risk Aversion and Budget Constraints The post Conservatism in Science looked at some evidence on whether science was biased in favor of incremental science. One argument made in that post is that it’s easier to identify really good research proposals if they rely on a knowledge base reviewers are familiar with. If only really good proposals can be funded because the research budget is too tight, then that might mean more unusual ideas that are harder to evaluate don’t make the cut, creating a bias towards conservatism in science. A new paper provides some further evidence on this point. The updated post now includes the following paragraphs: A 2023 working paper by Carson, Graff Zivin, and Shrader provides some further support for the notion that, when budget constraints bite, proposals with a greater degree of uncertainty are the first to be dropped. Carson and coauthors conduct a series of experiments on scientists with experience serving as NIH peer reviewers. In one experiment with 250 participants, they showed reviewers a set of ten grant proposals. The title and abstract of these proposals were drawn from real NIH grants, but in the experiment participants were provided with a set of 30 fictional peer review scores, ranging from 1 (best) to 9 (worst). They were then asked to pick four to (hypothetically) fund. We don’t have a measure of novelty here, but the variance of peer review scores is a potentially informative related measure, as it indicates disagreement among peer reviewers about the merits of a proposal. Carson and coauthors show that, among proposals with the same average score, participants are actually more likely to select proposals with a greater variance in their peer review scores to be funded! But in the next stage of their experiment, they ask participants to imagine their research budget has been cut and now they have to drop one of the four proposals they selected to fund. When asked to tighten their belts, which projects do reviewers in this experiment choose to drop? As we might expect, they cut the ones with the lowest average. But above and beyond that, participants are also more likely to choose to cut the ones with the more variable scores. Read the whole article Measuring the extent of knowledge spillovers A key idea in the economics of innovation is the knowledge spillover: the research work I do tends to benefit people besides myself. This dynamic is an important reason why innovation has unusual properties, relative to other kinds of economic activity. The post Knowledge Spillovers Are a Big Deal looks at some papers to argue that knowledge spillovers matter in practice, as well as in theory. I’ve rearranged this paper a bit to highlight two new additions. First, a new paper by Aslan and coauthors provides descriptive data on the extent of knowledge spillovers in biomedicine. From the article update: Aslan et al. (2023) show pretty similar results in biomedicine. Since 2008, the NIH has classified its research grants into hundreds of different research categories, such as “cerebral palsy”, “vector-borne diseases”, and “lead poisoning” (to pick three examples at random). How often do grants for one category result in research publications in other categories? Quite often it turns out. To see how often this kind of unexpected spillover happens, Aslan and coauthors get data on 90,000 funded NIH grants over 2008-2016, and 1.2mn associated publications. If the NIH and journals used the same classification system, it would then be a simple question of seeing how often a grant and its publications are assigned the same category (minimal spillovers) versus different categories (large spillovers). But there are two challenges. First, unfortunately journals do not classify articles into categories using the same system that the NIH uses to classify its grants. Aslan and coauthors instead use machine learning algorithms to assign journal articles to the NIH’s categories, based on the text of the journal abstracts. Second, the NIH classification system can be too granular for identifying significant knowledge spillovers. For example, there are categories for both “tobacco” and “tobacco smoke and health.” If research dollars are spent on a proposal assigned to the category “tobacco” but then generate a publication tagged as “tobacco smoke and health”, then while it is technically true that the grant generated knowledge applicable to a different category of knowledge than expected, the new category is so similar to the original that it doesn’t really feel like a significant knowledge spillover. To reduce this worry, Aslan and coauthors use a clustering algorithm to cluster categories frequently assigned to the same grants. This results in 32 different clusters of NIH topics. “Tobacco” and “tobacco smoke and health” now fall under the same category, for example, so that a grant assigned to “tobacco” but generating research assigned to “tobacco smoke and health” would no longer be classified as a knowledge spillover, since both categories are part of the same cluster. In the end, 58% of publications are assigned at least one category that is different from the ones assigned to the grant. In other words, more than half of the publications emerging from NIH grants are at least partially about a topic significantly different from the topics that the research grant was originally assumed to be about. The original article also included a discussion of Bloom, Schankerman, and Van Reenen (2013), which showed private sector R&D appears to “spillover” to other firms working on similar technologies, leading to more patents and greater productivity for these peers. The update now (briefly) notes that this paper’s analysis was repeated on a larger dataset in 2019, finding broadly similar results as the earlier paper. Read the whole thing Aging Economists Finally, the post Age and the Impact of Innovation looked at some of the literature on how research impact metrics change over a researcher’s life. The original post looked at Yu et al. (2022) and Kaltenberg, Jaffe, and Lachman (2021) which showed that the average citations received by biomedical scientific research and patents, respectively, decline substantially as scientists and inventors age. We can now add economists to this dataset. A new paper by Kosnik and Hamermesh (2023) finds that as economists get older, the citations to their publications in a set of top journals also decline substantially. As discussed in the post, the story is actually more complicated than it seems though. One complicating wrinkle discussed in the appendix to that post is that Yu and coauthors show life scientists who do not produce as many papers and whose work isn’t as highly cited drop out of research over time. That means older researchers are, on average, as productive as younger ones, but only because the set of older researchers is limited to the most productive and the set of younger ones includes all the people who will eventually drop out. Hamermesh and Kosnik (2023) also show that economists are less likely to retire if they have published more often in top journals in the preceding decade. Read the whole thing Until Next Time Thanks for reading! If you think the updated posts above are interesting, you might also be interested in the following related posts: For more on conservatism and science, see Biases against risky research For more on spillovers, see Adjacent knowledge is useful For more on age and innovation, see Age and the nature of innovation As always, if you want to chat about this post or innovation in generally, let’s grab a virtual coffee. Send me an email at matt.clancy@openphilanthropy.org and we’ll put something in the calendar.
October 2023 Updates
Literature Reviews and Innovation
Literature Reviews and Innovation
This article will be updated as the state of the academic literature evolves; you can read the latest version here. A podcast version will be released next week (traveling this week). Special thanks to Yian Yin for pointing me to Haustein, Costas, and Lariviére (2015) and Fang et al. (2020). We here at New Things Under the Sun are big fans of literature reviews. In a world where perhaps ideas are getting harder to find because of the burden of knowledge, it sure seems like literature reviews, that curate and synthesize a large volume of work, must be an important. But is that true? What do we really know about the effects of literature reviews on science and innovation? Subscribe now Do People Read Literature Reviews? One indicator of the importance of literature reviews is how well they get cited relative to traditional articles. If they tend to be highly cited, that’s one sign that they’re an important part of the knowledge ecosystem (though obviously not decisive on its own). To assess that, we can pull data from Haustein, Costas, and Lariviére (2015), which counts short-run academic citations1 to both traditional and review articles published in 2012. Using the altmetrics database, it also tracks a variety of other indicators; we’ll look at mainstream media mentions, which are part of how research results get communicated to the public at large. Lastly, I’m particularly interested in whether literature reviews are more informative for policy development. To get a handle on that, we can use Fang et al. (2020), which counts citations from 2.7mn policy documents to the academic literature. These policy documents are drawn from around the world, and include government, think tank, NGO, and IGO documents. The following figure compares the average citations received by review articles to the average citations of traditional articles across three audiences: academia, the policy world, and mainstream media. Data on academic and mainstream media cites is from the density entries of Table 2 of Haustein, Costs and Lariviére (2015); data on policy document cites is from figure 3 of Fang et al. (2020) Across the three domains, review articles tend to be more highly cited, on average, than original research. Within academia, review articles are cited by other academic publications at a rate about 2.3x that of traditional articles, at least for this sample of publications from 2012. Reviews are also more highly cited by the policy world, with review articles receiving on average 1.8x as many cites from policy documents per article as traditional articles. Among the mainstream media, the two are cited at the same rate. You get similar results when you look at the probability a particular article type is cited. (One thing the above figure obscures is the vast differences in citation rates across audiences; the policy world cites review and traditional articles at roughly 10-20x the rate the mainstream media does, and the academic world cites them at 30-40x the rate of the policy world!) There are some caveats to the above. How review articles are identified in academic databases is the subject of some controversy. Moreover, normally it is desirable to normalize citation counts by field; it’s easier to get many more citations, for example, in a field that is very large, compared to one that is very small. If fields differ systematically in their size and how much they use reviews, or in how difficult it is to correctly classify reviews, then that could make the aggregate data above misleading. In an appendix to this post, I dig into these issues a bit. I don’t think they change any of the substantive conclusions though, so I omit them from the main text. My bottom line is that review articles are significantly more highly cited than traditional articles, on average, in academia and among the policy world. But citation does not necessarily signify genuine influence.2 Let’s turn to some of the (scant) evidence we have on the genuine influence of reviews. Literature Reviews and Field Formation We’ll begin with academia. McMahan and McFarland (2021) argue that one of the effects of literature reviews is to draw together work scattered across different microcommunities, often via highlighting the role of papers that can act as bridges between multiple niche communities. To illustrate their argument, let’s start with an example (from their paper). In the figure below, we have two networks representing a field of climate science. This figure represents a lot of information. In each of these networks, the nodes represent papers cited by a specific review article (“Integrated Assessment Models of Global Climate Change”, published in the Annual Review of Energy and the Environment in 1997). The bigger the node, the more citations the paper has received during a particular time period. In the figure, links between nodes represent how often these papers are cited together by other articles. This is an indication that they are about a topic that is somehow related. Finally, the figure covers two time periods. At left, we have the linkages between articles in climate science, during the seven years preceding publication of the review article that references all these publications. At right, the seven years after publication. From McMahan and McFarland (2021) We can see how a field changes by studying the changes between these two networks. Let’s start with the network on the left. Prior to the publication of the review article, we can see a few different clusters of papers: one cluster (in blue) for integrated assessment models of regional policy; one cluster (in green) for integrated assessment models related to uncertainty; and one in yellow for climate modeling. That is, in the seven years preceding publication of this literature review, there were, roughly speaking, a few different sub communities that worked on different niche topics in climate modeling. We see this through the frequent co-citation of articles within each cluster and infrequent co-citations between clusters. If I’m writing about modeling uncertainty, I am likely to cite more than one of the papers in the uncertainty cluster, but less likely to cite any papers in the climate modeling cluster. After the review is published, we no longer see these three distinct clusters. Instead, we have moved towards one denser cluster with more of a hub and spoke structure. Papers from the various original clusters are now frequently co-cited with papers in formerly separate clusters, and especially with a few major papers, which previously bridged different clusters. This is most clear for paper 1, which in the left figure is not highly cited, but is co-cited with papers in two different clusters and has now become highly cited. After the review, it’s now the central hub of a dense network of papers. McMahon and McFarland show this kind of pattern isn’t an anomaly specific to climate science, but a pattern that broadly follows the publication of a review article. They build a dataset based on all the Annual Review articles published between 1990 and 2016, as well as all the articles published in a set of more than 1,000 major journals. The set of articles published in Annual Review journals forms their set of literature reviews, since this journal series specializes in review articles. They then use some statistical analyses to establish some reliable statistical associations. After an Annual Review article is published: The network of cited articles is divided into fewer distinct clusters The number of steps in a chain of citation between two different papers shrinks (for example, because most papers are now co-cited with at least one major hub paper) Most papers start to receive fewer citations, but a small number start to receive more Those three traits largely match the consolidation dynamics in the illustration: less separation into distinct clusters, and a few papers emerging as central hubs (with the rest perhaps a bit left behind). That doesn’t necessarily prove that is was the Annual Review article that caused these changes though. It’s quite plausible that these dynamics are merely the natural evolution of fields. Maybe Annual Review articles merely act as records of processes that are underway with or without them, much in the way that newspapers record the great events of the day without causing them. Ideally, we would want to run an experiment, where we get Annual Reviews to commission a bunch of literature reviews, but then randomly publish only some of them. We could then compare the evolution of the network structure of cited references in the published and unpublished articles. McMahan and McFarland can’t do that; but they try the next best thing, which is to at least identify sets of cited articles that look like they could be the target of an annual review article, but which do not in fact get one (maybe for random reasons). Let’s call these the reviewed and unreviewed networks. If both reviewed and unreviewed networks look the same before Annual Review articles are published, and different afterwards, then that’s some evidence the Annual Review publication induced the change. To identify a set of unreviewed networks that closely resemble reviewed networks (prior to publication), they look at the citation networks of traditional articles. Specifically, they identify a subset of articles whose co-citation networks resemble the co-citation networks of the cited references in an Annual Review article, in terms of the number of clusters and length of citation paths between papers, and where the cited documents also are of a similar “age” and receive similar numbers of citations as in the reviewed set. McMahan and McFarland then look at how the reviewed and unreviewed co-ciation networks evolve in the wake of an Annual Review article being published (for the reviewed networks) or a traditional article (for the unreviewed). They find t...
Literature Reviews and Innovation
September 2023 Updates
September 2023 Updates
New Things Under the Sun is a living literature review; as the state of the academic literature evolves, so do we. This post highlights some recent updates. One theme of this update is responding to feedback, which is always welcome. Thanks! Thanks for reading What's New Under the Sun! Subscribe for free to receive new posts and support my work. Peer Review The article “What does peer review know?” surveyed some studies that compare peer review scores to long-run outcomes, both for grant proposals and journal submissions. It argues peer review scores do predict long-run outcomes, but only with a lot of noise. Misha Teplitskiy pointed me to some additional papers on this topic, which reinforced this point. The updated article now includes the following section. Gallo et al. (2014) obtain pretty similar results as above for the peer review scores of the American Institutes of Biological Sciences, an organization that provides expert peer review services for clients. In the figure below, on the horizontal axis we see the peer review scores for 227 projects reviewed by American Institutes of Biological Sciences peer reviewers that were ultimately funded. These range from 1 (the best) to 5 (the worst) (note the figure stops at 4; no projects receiving a score worse than that were funded). On the vertical axis we have a normalized count of all the citations to publications that emerged from the grant. As with the NIH data, we again observe a noisy but pretty consistent relationship: the better the peer review score, the more citations eventually earned.1 From Gallo et al. (2014) Clavería et al. (2000) obtains similar results in a review of 2,744 proposals funded by the Spanish Health Research Fund over 1988-1994. In this case, the peer review data available is pretty coarse: Claveria and coauthors just know if reviewers classified projects as “excellent/good”, “acceptable”, or “questionable/rejected.” However, a distinguishing feature of this study is that in 1996 the authors arranged for each of these proposals to be reviewed retrospectively by new reviewers. These reviewers looked at the original proposals, the annual and final reports, and published papers originating from the project, and assigned each of the now-completed proposals a score of 1-10 (higher is better) for its actual scientific performance. So, if we are concerned that quantitative indicators like citations or publication counts are inappropriate ways to evaluate science, this study gives us a more holistic/subjective assessment of research quality. The study again finds that peer review scores are noisily correlated with measures of quality. Spanish Health Research Fund proposals were reviewed by two commissions, one comprised of experts with topical expertise, and one with experts from related fields. After controlling for research level, duration, budget, and year of project onset, projects that received an “excellent/good” review at the proposal stage from the related field commission were rated 0.3 points higher when the completed projects were reviewed (recall, on a ten point scale). An “excellent/good” review from the commission with more direct topical expertise was associated with a 0.7 higher rating. (If you do not adjust for research level and others, the association is a bit stronger). Again - better peer review scores seem to be associated with better outcomes, but the association isn’t super strong (for context, the average rating for completed projects was 5.0/10). The rest of the article turns to similar evidence from peer review reports to journal submissions. Read the whole article Screening for Statistical Significance? Turning to the effects of peer review and editor discretion on publication bias, the article “Publication bias without editors? The case of preprint servers” looks at the causes of publication bias. It could be that publication bias arises at the journal submission stage; maybe editors and peer reviewers screen out papers that find non-significant results? The article looks at preprint servers to see if that’s so, and argues such a process is not the main driver of publication bias. It is not merely the case that reviewers bounce all the papers that are submitted but obtain results that are not statistically significant. Instead, such papers do not seem to even be written up and submitted. A new paper by Brodeur et al. provides quite clear evidence of this dynamic by following submissions and publications at the Journal of Human Resources. I’ve incorporated discussion of that paper into a discussion of another (Broderick, Cook, and Heyes 2020), already covered in the original version of the article. We pick up after describing how you can identify the statistical fingerprints of p-hacking by looking for a suspicious pileup of test-statistics that are just barely statistically significant (and hence, perceived to be publishable). Brodeur, Cook, and Heyes (2020) and Brodeur et al. (2023) look for [a] suspicious pileup right above the conventional thresholds for statistical significance. The set of four figures below plot the distribution of two kinds of test statistics found in various samples of economics papers. The top row, from Brodeur, Cook, and Heyes (2020) plot the distribution of something called a z-statistic, which divides a normalized version of the effect size by an estimate of precision. A big z-statistic is associated with a precisely estimated effect that is large - those are places where we can be most confident the true effect is not actually zero. A small z-statistic is a small and very imprecisely estimated effect size; those are places where we worry a lot that the true effect is actually zero and we’re just observing noise. The bottom row, from Brodeur et al. (2023) plots a closely related statistic, a p-value, which is (colloquially) the probability a given set of data would arise simply by chance, if there is no genuine effect out there. Top row from Brodeur, Cook, and Heyes (2020), bottom row from Brodeur et al. (2023) There are two interesting things we can read off this figure. First, we look to see if there is a suspicious pileup right above (for z-statistics, so top row) or below (for p-values, so bottom row) important thresholds. Those thresholds are indicated by vertical lines and each distribution shows spikes of test statistics just barely in the statistically significant range. In other words, lots of papers just happen to be finding with results that are barely statistically significant by conventional standards. The second interesting thing relates to the similarity of these patterns across the four figures. In the top-right, we have the distribution of test-statistics from papers published in top 25 economics journals in 2015 and 2018. In the top-left, Brodeur and coauthors go back and identify published pre-print versions of these papers and do the same analysis. For the purposes of the current discussion, the main point is that this anomalous distribution of test statistic results is already there in the working paper stage. If we interpret this as evidence of p-hacking, it’s telling us that researchers don’t do it when reviewers complain - they do it before they even submit to reviewers. A limitation of the top row is that we don’t actually see how peer review affects what gets published. We started with the set of published papers, and then looked back to see what those papers looked like when they were just working papers. But we don’t know if the stuff that wasn’t published was better or worse, in terms of evidence for p-hacking. That’s where the second row comes in. Although it’s a more limited sample, in the bottom left we now have a large sample of papers that were submitted to one particular journal. In the bottom right, we have the papers that ended up being published. Again, there’s not a large difference between the two. It’s not really the case that economists submit papers without much evidence of p-hacking but then peer reviewers only publish the stuff that exhibits signs of p-hacking. If it’s there, it’s there from the start. (Aside - Brodeur et al. 2023 actually finds some evidence that editors are a bit more likely to desk reject papers with results that are just barely statistically significant, while peer reviewers display the opposite tendency. The two effects seem to mostly wash out. For more on the relative merits of accountable individual decision-makers, such as editors, relative to peer review, see Can taste beat peer review?) Read the whole article Variation in Publication Bias The preceding argued that publication bias ultimately stems from researchers anticipating a better reception for papers that obtain statistically significant results. But as highlighted in “Why is publication bias worse in some disciplines than others?” an additional puzzle is why this problem seems to be worse in some fields. I’ve updated this article to incorporate discussion of Bartoš et al. (2022), which uses more sophisticated methods to assess the extent of publication bias across different fields. After discussing how different forms of publication bias can lead to unusual distributions of statistical test statistics, and how Bayesian model averaging can leverage those distortions to assess the likelihood of different forms of bias, the post continues: Bartoš et al. (2022) identify about 1,000 meta analyses across environmental sciences, psychology, and economics, covering more than one hundred thousand individual studies (the lion’s share in economics), and another 67,000 meta-analyses in medicine that cover nearly 600,000 individual studies in medicine. For each field, they see how likely it is that different sets of assumptions would generate data displaying these patterns, and then how likely it is that each of these models is “correct.” Lastly, once they have the probability all these different models are correct, they can “turn off” publ...
September 2023 Updates
Big firms have different incentives
Big firms have different incentives
This post is a collaboration between me and Arnaud Dyèvre (@ArnaudDyevre), a PhD student at the London School of Economics working on growth and the economic returns to publicly funded R&D. Learn more about my collaboration policy here. This article will be updated as the state of the academic literature evolves; you can read the latest version here. You can listen to this post above, or via most podcast apps here. In a previous post, we documented a puzzle: larger firms conduct R&D at the same rate as smaller firms, despite getting fewer (and more incremental) innovations per R&D dollar. Why wouldn’t firms decelerate their research spending as the return on R&D apparently declines? In this follow-up post, we look at one explanation: firms of different sizes face different incentives when it comes to innovation. In a later post, we’ll review another explanation, that large firms have different inventive and commercialization capabilities.1 Subscribe now Cost spreading and invisible innovations To start, let’s revisit our claim that the return to R&D seems to fall as firms get larger. Is this accurate? We can think of the returns to R&D as the “results” a firm gets out of R&D, divided by that firm’s R&D “effort.” Typically we measure those “results” by new patents, products, or streams of profit. It turns out some of these measures might understate innovation by large firms, because larger firms are more likely to generate process rather than productinnovations. Process innovations are concerned with better ways of delivering a service or manufacturing a product, not creating a new business line. Process innovations will not show up directly in product based measures of innovation.2 For example, some earlier posts have looked at the introduction of new consumer products or the attributes of car models as measures of the output of innovation. And while process innovations can be patented, they are probably less likely to be patented than new products. For example, a 1994 survey (Cohen, Nelson and Walsh 2000) asked 1500 R&D labs in the manufacturing sector to rank five different ways of capturing the value of new inventions. Among the 33 different sectors to which the firms belonged, just 1/33 thought patents the most effective way to protect process inventions compared to 7/33 who thought them the most effective way to protect a new product invention. In contrast, 16/33 sectors think patents the worstway to protect new process inventions, compared to 10/33 think patents the worst way to protect product inventions. Another way to summarize the survey is to note that only 23% of respondents reported that patents were effective means to appropriate process innovations while 35% considered them effective to appropriate product innovations. If process innovations are less likely to find their way into the catalogues of new products or the patent portfolio of firms, then they are less likely to be picked up by conventional measures of innovation. If larger firms are disproportionately likely to engage in process innovation, that will make it seem as if larger firms get fewer results from their R&D. And we do have some evidence large firms are more process innovation oriented. Liu, Sojli, and Tham (2022) use natural language processing to try and classify patents as protecting process or product innovations. The main approach breaks the title of patents and their claims into multiple components, and then looks to see if these strings of words contain words like “process”, “method” or “use” (which indicate a process), or words like “product”, “apparatus” or “tool” (which indicates a product). When they ask patent examiners and an IP management firm to classify a random sample of hundreds of patents classified by their algorithm, they come up with the same answer around 90% of the time. They show that, over 1976-2020, US public firms that have more active process patents than active product patents to be larger. We also have some non-patent evidence, though it’s based on pretty old surveys at this point. Akcigit & Kerr (2018)match Census data on U.S. firms to a comprehensive survey of R&D activities by the NSF (covering 1979-1989) and find a positive correlation between firm size (defined here as log employment) and the share of R&D dedicated to process innovation. So both the patent and survey-based evidence suggests larger firms do more process innovation than product innovation. And we also have pretty good theoretical reasons to expect this should be the case. As Matt has written elsewhere, when a particular kind of technology gets more profitable to invent, firms do more R&D on that kind of technology. To the extent the profitability of different kinds of R&D differ as firms scale, it’s not surprising that their R&D choices should differ. For example, larger firms typically have a wider portfolio of products and sell more products in each line, so it therefore makes sense for them to find more efficient ways to produce and deliver these products because they can spread the costs of their process innovation over more products and product lines. If you expect to sell ten thousand cars, it’s worth $10,000 to invent a process that reduces the cost of manufacturing by $1 per car. If you expect to sell a million, you’ll pay $1 million to invent the same technology. This explanation has been referred to as the cost spreading advantage of larger firms in conducting R&D: the bigger the firm, the greater the level of output over which it can apply its process R&D. Cost spreading pushes bigger firms toward process innovation. So one reason we may observe fewer innovations per dollar among large firms is that their size incentivizes them to focus on harder-to-observe process improvements. More speculatively, it might be that a similar dynamic also affects our measurement of the inputs to R&D that further biases our measures of the R&D productivity of firms. It has long3 been suggested smaller firms might underreport R&D expenditures, which would tend to inflate their measured R&D productivity (because they would seem to get more from less). One reason for that might be that, if firms can receive tax breaks for R&D expenditure, larger firms may invest more in sophisticated ways of claiming these breaks, either via more careful documentation or by pushing the boundary of what can be claimed as an expense. It’s kind of a cousin to cost-spreading; if there is a fixed cost of aggressively reporting R&D spending (for example, because you have to hire more tax lawyers), that cost might be more worth enduring for larger firms with more plausible R&D expenses. Boeing & Peters (2021), for example, provide evidence that R&D subsidies are often used for non-research purposes in China. And this isn’t the only possible reason small firms might under-report R&D. Roper (1999) suggests it could also be because it’s harder to measure R&D spending in smaller firms that don’t have full time research staff or dedicated research labs (and so it’s harder to tell what’s R&D and what’s not). That said, while it seems plausible, I’m not aware of evidence that documents biased R&D reporting. Indeed, in Boeing and Peters (2021), they actually do not find any statistically significant correlation between the size of firms and their tendency to mis-report R&D. The Replacement Effect The cost spreading incentive pushes firms toward process innovation, which might be harder to observe but should still be considered a form of genuine innovation. Another incentive pushes them away from product innovation though: the replacement effect. If a better version of a product is invented, most people will buy the improved version rather than the older one. If you are an incumbent firm that was previously selling that older version, that’s a reason to be less excited about a new product: if you invent a new product, you are partially competing against yourself. If you’re an entrant though, you won’t care. Since incumbents will tend to be larger firms, this dynamic might also explain differences in how firms innovate as they grow larger. This is an old argument in economics, dating back to Kenneth Arrow (1962), which was later named the ‘replacement effect.’4 Incumbent firms’ reluctance to do R&D in domains that could threaten their core business is closely related to what is sometimes called the innovator’s dilemma in the business literature and is a core tenet of some endogenous growth models.5 The recent development of chatbots powered by large language models offers a possible illustration of this dynamic. Google seems to have underinvested in the type of AI technology powering OpenAI’s ChatGPT because it would be a direct siphon of the ads revenues generated by its own search engine. As a result, Google is finding itself having to make up for lost ground in the AI race it once dominated. Documenting the extent of the replacement effect at large is a bit tricky because you are looking for R&D that doesn’t happen. One way we could do this is if we came up with a bunch of good ideas for R&D projects and randomly gave the ideas to large and small firms. We could then see which firms ran with the ideas and which ones left them alone. The trouble is, it’s hard enough for firms to come up with good ideas for themselves, let alone innovation researchers to come up with good ideas for them. But there are two studies that are related to this thought experiment. Cunningham, Ederer, and Ma (2021), while not about innovation and the size of firms specifically, provides some excellent documentation of replacement effect style dynamics. Their context is the pharmaceutical sector, where it is quite common for large incumbent firms to source new R&D projects from small startups. The sector is also one where there is high quality data available on the different research projects (here, new drug compounds) that firms are working on. Cunningham, Ederer, and Ma...
Big firms have different incentives
Geography and What Gets Researched
Geography and What Gets Researched
This post was jointly written by me and Caroline Fry, assistant professor at the University of Hawai’i at Manoa! Learn more about my collaboration policy here. How do academic researchers decide what to work on?  Part of it comes down to what you judge to be important and valuable; and that can come from exposure to problems in your local community. For example, one of us (Matt), did a PhD in Iowa, and ended up writing a paper on the innovation impact of ethanol-style policies (ethanol is a big business in Iowa). One of us (Caroline), was leaving Sierra Leone after two years there, just as the Ebola epidemic was starting. She became interested in understanding why science capacity is so low in some countries and not others, and what that means for the development of drugs and vaccines to combat local problems. (Indeed, we’ll talk about two of the papers that emerged from that research program in just a minute.) Brief Pause for Some Announcements The Institute for Replication is looking for researchers interested in replicating economic and political science articles. Research using non-public data (for example, Bell et al. 2019, discussed below) is a formidable barrier for reproducibility and replicability - so they are offering up to 5,000 USD and coauthorship on a meta-paper combining hundreds of replications. A list of studies of eligible studies is available here, with payment info. Please contact instituteforreplication@gmail.com for more detail and indicate which study you would like to replicate. They are interested in 3 types of replications: (i) using new data, (ii) robustness checks and (iii) recoding from scratch. Open Philanthropy’s Innovation Policy program is currently soliciting pre-proposals from individuals for financial support to write living literature reviews about policy-relevant topic areas. Interested individuals should have a PhD related to their proposed area and should contact matt.clancy@openphilanthropy.org for more information. This article will be updated as the state of the academic literature evolves; you can read the latest version here. You can listen to this post above, or via most podcast apps here. Back to Geography and What Gets Researched! Subscribe now Testing the relationship between location and research choice Both of us made research decisions that were, in part, influenced by exposure to local problems. Are we atypical, or is this path of exposure to research choice a common one? The role of exposure to local problems in determining research choice is difficult to test. People might locate themselves in places precisely because they are interested in the problems in those places. The ideal way to test this would be to randomly assign researchers to different locations and see if they work on local problems that they are exposed to. However, randomly assigning researchers usually isn’t particularly feasible. Alternatively, we could randomly “assign” problems to different locations and see if local researchers begin working on those problems after exposure. One candidate for a problem that all-but randomly arises in some locations but not others is a novel disease outbreak. So one way to assess how strong is the local problems to local research link is to see how scientists respond to local disease outbreaks. Fry (2022) takes this strategy and evaluates the impact of the 2014 West African Ebola epidemic on the publication output of endemic country scientists: did scientists working in areas hit harder by Ebola begin to disproportionately work on it? To see, Fry starts with a dataset of 57 endemic country biomedical scientists (those affiliated with institutions in Sierra Leone, Guinea and Liberia, the three hardest hit countries, at the time of the epidemic). She then matches these endemic country scientists to 532 control scientists who are from non-endemic countries in West or Central Africa, but who are at similar points in their career, work in similar areas, publish at similar rates, have similar rates of international collaboration, and reside in countries with similar GDP per capitas. She pulls out the publication record for each sample scientist for the four years before and six years after the epidemic from the Elsevier Scopus publication database, and creates counts of annual publications. Finally, she separates these counts into Ebola and non-Ebola publications through a key word search of the title, abstract and key words of the publications. Fry compares the changes in publication output of endemic country scientists to that of the control scientists, adjusting for persistent differences between individual scientists, typical career age trends, and variation in publication trends over time for all scientists. As illustrated in the figure below, prior to 2014 none of the scientists in her sample really focused on Ebola. Beginning in 2014, endemic country scientists experience a large and fairly sustained increase in their publication output of Ebola related publications, as compared to non-endemic country scientists. That implies exposure to a new problem in a researcher’s location can shift their attention towards that problem. (It could be about something besides exposure too – we’ll talk about that later) From Fry (2022) Location and research focus are correlated We noted above that our ideal experiment would randomly allocate scientists to different locations. While we may not be able to do that, scientists do change locations of their own accord and insofar as local problems drive research choice, then we might expect to see similar patterns when they do.  Fry (2023) tests exactly this. The working paper builds a dataset of 32,113 biomedical scientists affiliated with an African institution between 2000 and 2020, their publication output in different disease areas (by extracting key words from the title and abstract of their publication), and uses the affiliation listed in these publications to infer their country affiliation in each year. She then compares the research choices of these African scientists (proxied by the number of publications on each diseases) with the disease burden in their country of residence. The idea is to compare the disease focus of mobile researchers before and after their move to that of matched control researchers who don’t migrate. She finds, indeed, that researchers are more likely to publish papers on diseases that are more prevalent in their host country after they move there. This trend is particularly salient for researchers moving into Africa from outside the continent. And note, this is relative to matched scientists who did not move, but prior to the move were publishing at similar rates, on the same diseases, as the scientists who move. We can see similar dynamics beyond the specific context of neglected tropical diseases. Moscona and Sastry (2022)provide some additional data from global agriculture, where there is substantial international variation in crop pests and pathogens. Moscona and Sastry search for the names of specific pests and pathogens in the titles, abstracts, and descriptions of agricultural patents across the world (using a dataset on international crop pests and pathogens from the Centre for Agriculture and Bioscience International). For example, there might be a patent for a pesticide to control a specific kind of pest, or a patent for a gene that confers resistance to some kind of pathogen. Since inventors list their country of residence on patents, Moscona and Sastry can see if inventors disproportionately invent technologies that mitigate pests and pathogens present in their country of residence. That seems to be the case. In the figure below, they show that for any given crop pest or pathogen (which they call a CPP), the number of patents by inventors in the same country where those pests and pathogens are found is much higher than the share of patents by inventors from other countries. Moscona and Sastry also statistically estimates the relationship between patents on a given pest or pathogen by country inventors and the presence of those pests/pathogens in that country, holding country and pest/pathogen differences fixed. That analysis also finds local presence is a strong predictor of local patenting related to a given pest or pathogen. From Moscona and Sastry (2022) Why would location affect research choice? Taking this cluster of papers as providing at least preliminary evidence that location influences research choice, the next question is: why? We’ve suggested it could be due to researchers being exposed to local problems, and that’s certainly one likely channel. It would be consistent, for example, with research finding that women scientists are more likely to work on issues that disproportionately affect women (suggesting that different researchers find different problems more salient and important to investigate). But a researcher’s location could influence their choice of topics in a number of other ways too. Researchers around the world might be equally interested in a topic, but local researchers could have an advantage in studying a particular topic because of better access to local data, for example, samples of viruses, pests, pathogens, or infected people. It may also be that local funders of research, rather than researchers themselves, are more likely to know and care about local problems. (That said, at least in the case of the 2014 Ebola epidemic, Fry 2022 finds no correlation between domestic funding for Ebola research and the shift towards it) Beyond these direct effects of location on research choice, one secondary effect could be social contagion from other researchers: even if researchers are not initially motivated to study local problems, they may want to locally collaborate, and if local collaborators are more likely to be working on local problems, they are more likely to begin working on the topic too. We do have some evidence that res...
Geography and What Gets Researched
How to Impede Technological Progress
How to Impede Technological Progress
“Everything that’s happening is coordinated by someone behind the scenes with one goal: to completely ruin scientific research.” – Da Shi, in The Three Body Problem by Liu Cixin Most of the time, we think of innovation policy as a problem of how to accelerate desirable forms of technological progress. Broadly speaking, economists tend to lump innovation policy options into two categories: push and pull policies. Push policies try to reduce the cost of conducting research, often by funding or subsidizing research. Pull policies try to increase the rewards of doing research, for example by offering patent protection or placing advance orders. These have been extensively studied and while they’re not silver bullets I think we have a good evidence base that they can be effective in accelerating particular streams of technology. But there are other times when we may wish to actively slow technological progress. The AI pause letter is a recent example, but less controversial examples abound. A lot of energy policy acts as a brake on the rate of technological advance in conventional fossil fuel innovation. Geopolitical rivals often seek to impede the advance of rivals’ military technology. Today I want to look at policy levers that actively slow technological advance, sometimes (but not always) as an explicit goal. I think we can broadly group these policies into two categories analogously to push and pull policies: Reverse push (drag?): Policies that raise the costs of conducting research. Examples we’ll look at include restrictions on federal R&D funding for stem cell research, and increased requirements for making sure chemical research is conducted safely. Reverse pull (barrier?): Policies that reduce the profits of certain kinds of innovation. We’ll look (briefly) at carbon taxes, competition policy, liability, and bans on commercializing research. The fact that conventional push and pull policies appear to work should lead us to believe that their reverses probably also work; and indeed, that’s what most studies seem to find. But there are some exceptions as we’ll see. Brief Pause for Some Announcements If you’re a fan of what I’m doing here at New Things Under the Sun, and want to write something yourself, you may be interested in the following: Interested in collaborating with me on a post? Click here for details. The Roots of Progress Blog-Building Intensive is a new 8-week (free!) program for aspiring progress writers to start or grow a blog. Learn more or apply here. Open Philanthropy’s Innovation Policy program is currently soliciting pre-proposals from individuals for financial support to write living literature reviews about policy-relevant topic areas. Interested individuals should have a PhD related to their proposed area and should contact matt.clancy@openphilanthropy.org for more information. Like the rest of New Things Under the Sun, this article will be updated as the state of the academic literature evolves; you can read the latest version here. You can listen to this post above, or via most podcast apps here. Subscribe now Back to the Article Reverse Push Policies Sort of Working Let’s start with two studies that have the effect of making it more expensive (in terms of time or money) to do certain kinds of research. Both these studies are going to proceed by comparing certain fields of science that are impacted by a new policy, to arguably similar fields that are not impacted by the policy. By seeing how the fields change relative to each other both before and after the new policy, we can infer the policy’s impact. Let’s start with US restrictions on public funding for research involving human embryonic stem cells. The basic context is that in 1998, there was a scientific breakthrough that made it much easier to work with human embryonic stem cells. While this was immediately recognized as an important breakthrough for basic and applied research, a lot of people did not want this kind of research to proceed, at least if it was going to result in the termination (or murder, depending on your point of view) of human embryos. A few years later, George W. Bush (who was sympathetic to this view) won a closely fought US presidential election and in August 2001, a new policy was announced that prohibited federal research funding for research on new cell lines. Research reliant on existing cell lines was still eligible for funding, but since most of the existing cell lines were not valuable for developing new therapies, this restriction was more significant than it might naively seem. No restrictions were placed on private, state, or local funding of human embryonic stem cell research, but anyone who received funds for this kind of work would need to establish a physically and organizationally separated lab to receive federal funding for permissible research on existing lines. To see how this policy change affected subsequent research, Furman, Murray, and Stern (2012) identify a core set of papers about human embryonic stem cell research and RNAi, another breakthrough in the same year and originating in the US that unaffected by the policy, but which was perceived to be of similar scientific import. They then look at how citations to those core papers evolve over time, with the idea that a citation to one of these core papers is a (noisy) indication that someone is working on the topic. Because foreign scientists are unaffected by US policy, they also divide these citations into those coming from papers with US researchers and those without. They estimate a statistical model predicting how many US and foreign citations a core paper in either topic receives, in each year, as a function of its characteristics. A key finding is illustrated in the following figure, which tracks the percentage change in citations from US-authored articles to human embryonic stem cell research, as compared to a baseline (which includes RNAi papers, and citations from foreign-authored articles). Prior to 2001, citations by US authors to papers on human embryonic stem cells were about 80% of a baseline, but the error bars were wide enough so that we can’t rule out no difference from baseline. Beginning in 2001 though (when the policy was announced), US citations to these papers dropped by a pretty noticeable amount - from roughly 80% of baseline to 40%. How citations from US authors to human embryonic stem cell papers fare, compare to a baselineFrom Furman, Murray, and Stern (2012) Note though; just three years later, in 2004 things may have been back to their pre-2001 levels. But the restrictions on federal research weren’t relaxed in 2004. So what’s going on?  We’ll return to this later. For now, let’s turn to another study that shows reverse push policies (of a sort) can exert a detectable influence on basic research. This time, we’ll look at a policy whose goal was not to reduce the amount of research, but instead to simply make sure it was done in a safer manner. In 2008 Sheharbano (Sheri) Sangji died in a tragic UCLA chemistry lab accident involving flammable compounds. This incident and the subsequent criminal case for willful violation of safety regulations by the lab’s principal investigator and the Regents of the University of California galvanized a significant ratcheting up of safety regulations across US chemistry labs. For example, at UCLA, participants in lab safety classes rose from about 6,000 in 2008, to 13,000 in 2009 and 22,000 in 2012, while the number of safety inspections of labs rose from 1,100 in 2008, to 2,000 in 2009 and 4,500 in 2012. This was accompanied by an increase in laboratory safety protocols and more stringent rules for the handling of dangerous chemicals. To see what impact the increase in safety requirements had on chemistry research, Galasso, Luo, and Zhu (2023) gather data on the publications of labs in the UC system. They end up with data on the publications of 592 labs, published between 2004 and 2017 (note they exclude the lab where Sangji worked). To assess the impact of more stringent safety regulations, they cut the labs into two different pairs of sub-samples, with one half of each pair more impacted by the policy and the other half less impacted.  First, they hire a team of chemistry PhD students to classify labs as “wet”, which are equipped to handle biological specimens, chemicals, drugs, and other experimental materials, and “dry”, which are not and might do computational or theoretical research (these comprise 14% of labs). We should expect safety requirements to not affect dry labs, but possibly to affect wet ones - but not if they rarely work with dangerous compounds. So, as a further test, Galasso and coauthors use data on the chemicals associated with lab publications to identify a small subset of labs that most frequently work with compounds classified as dangerous. Because they need a long time series prior to 2008 for this classification exercise, they can only apply this method to 42 labs, out of which they flag the 8 working most often with dangerous compounds. Their main finding is that the impact of the increased safety requirements were pretty small. Indeed, comparing the publication output of wet labs and dry labs, there appears to be no detectable impact of the policy at all, even when trying to adjust for the quality of publications by adjusting for the number of citations received, or after taking into account potential changes in the sizes of labs. The effects were not totally zero though. When they zero in on labs using the most dangerous compounds, they find that after safety standards are ratcheted up, the most high-risk labs begin to publish about 1.2 fewer articles per year mentioning dangerous substances as compared to less dangerous wet labs (labs publish an average of 7.7 articles per year in the sample). The reduction is most pronounced for articles mentioning flammable substances, or dangerous compounds that haven’t ...
How to Impede Technological Progress
Does Advanced AI Lead to 10x Faster Economic Growth?
Does Advanced AI Lead to 10x Faster Economic Growth?
Dear readers, I’m still writing the next New Things Under the Sun post, but in the interim, I hope you’ll probably find this debate I had with Tamay Besiroglu as fascinating as I did.1 It’s about the claim that, once we develop AI that can do anything (mental) a human worker can do, the economy will start to grow much, much, much faster. This claim is actually implied by some pretty mainstream models of economic growth! Tamay and I had this debate in slow motion, in a shared google doc, over a few months, and it was published in Asterisk Magazine Friday. In the debate, I’m the skeptic and Tamay the advocate. While I think it’s pretty likely sufficiently advanced AI would lead to (somewhat) faster economic growth, I think growth of 20% per year and up is pretty unlikely. In contrast, Tamay thinks 20% annual growth and faster is pretty likely, if we successfully develop AI that can do every kind of human mental work. If you’re unfamiliar with this debate, I think we cover the fundamentals well. But even if you are familiar I think we also push past the basics and articulate some novel arguments too. You can read the whole piece over at Asterisk right now. Read the Debate Now If you prefer audio, Tamay and I also recorded a podcast version where we each perform our parts of the dialogue. That one should be ready in the next 24 hours - it’ll show up first at this link, and then on your local podcast app a bit later. Cheers, Matt 1 I actually covered some of Tamay’s work on New Things Under the Sun here!
Does Advanced AI Lead to 10x Faster Economic Growth?
And now for something completely different
And now for something completely different
This short post is to announce the launch of a new living literature review, on a topic almost the opposite of New Things Under the Sun: Existential Crunch, by Florian Jehn! Existential Crunch Thoughts about existential risk, history, climate, food security and other large scale topics. By Florian U. Jehn Existential Crunch is about societal collapse, and what academic research has to say about it. The first post takes a tour of the major schools of thought on this topic: Gibbon, Malthus, Tainter, Turchin and more. As the post says in it's closing: My main takeaway is that this field still has a long way to go. This is troubling, because in our society today we can see signs that could be interpreted as indications of a nearing collapse. There are voices warning that our global society has become decadent (writers like Ross Douthat), that we are pushing against environmental limits (for example, Extinction Rebellion), that we are having a decreasing return of investment for our energy system (for example, work by David Murphy) and that there has been an overproduction of elites in the last decades (writers like Noah Smith). This means we have warning signs that fit all major viewpoints on collapse. Moreover, new technological capabilities pose novel dangers that require us to extrapolate beyond the domain of historical experience. All this means that understanding how collapse really happens is rather urgent. If we want innovation and progress to continue (and I certainly do!), understanding how it dies seems, uh, important! Check it out, and sign up for the substack here. Why am I telling you about this? Well, one of the reasons I was excited to join Open Philanthropy was for the opportunity to support more living literature reviews, on a diverse array of topics. This is our first such review we’ve supported, but we’re interested in financially supporting more via the newly launched innovation policy program. We’re especially interested in people interested in writing reviews for policy relevant topics. For us, a living literature review is an online collection of short, accessible articles that synthesize academic research, updated as the lit evolves, and written by a single qualified individual (for example, Florian has published related academic work). If you're interested, go here for more info. And please, if you know of people who you think would be a good fit for this kind of thing, please let them know about this opportunity.
And now for something completely different
The Size of Firms and the Nature of Innovation
The Size of Firms and the Nature of Innovation
Like the rest of New Things Under the Sun, this article will be updated as the state of the academic literature evolves; you can read the latest version here. You can listen to this post above, or via most podcast apps here. Special note: Up until now, everything on New Things Under the Sun has been written by me. This is post is the first ever collaboration! My coauthor is Arnaud Dyèvre (@ArnaudDyevre), a PhD student at the London School of Economics working on growth and the economic returns to publicly funded R&D. I think this turned out great and so I wanted to extend an invitation to the rest of you - if you want to coauthor a New Things post with me, go here to learn more about what I’m looking for and what the process would be like. One last thing; I want to assure readers that, although this is a collaboration, I’ve read all the major papers discussed in the post. I view part of my job as making connections between papers, and I think that works best if all the papers covered in this newsletter are bouncing around in my brain, rather than split across different heads. On to the post! We are used to thinking about income inequality between individuals, but inequality between firms is vastly larger. In the US, the richest 1% of individuals earned about 20% of all income in 2018.1 In contrast, the top 1% of US firms by sales earned about 80% of all sales in 2018. The economy is populated by a few “superfirms” and multitude of small- to medium-size businesses. And this disparity is getting more extreme over time.2 Does this huge disparity in firm size matter for innovation and technological progress? Do big firms differ in the type of R&D they do, and if so, why? The academic literature about the empirical link between firm size and innovation is an old one, dating back to the 1960s at least,3 and we do not have space to do it full justice here. Instead, in this post we’ll focus on work using a variety of approaches to document that there are important differences in how innovation varies across firm sizes. In a followup post, we’ll examine some explanations for why. One quick point before digging in: when economists talk about firm size, they typically refer to its total sales or (more rarely) its employment count. Defined in this way, firm size is often used as an imperfect proxy for the number of business units of a firm (i.e. the number of product lines it has). Subscribe now Fact 1: Firm size and R&D rise proportionally The first important fact about firm heterogeneity and innovation is that corporate R&D expenditures scale up proportionately with their sales. In other words, when sales double, money spent on R&D doubles too. This doesn’t have to be the case: for example, it has been shown that other inputs in production such as labor4 and capital5 do not scale proportionately with firm sales (less than proportionately for labor, more than proportionately for capital). This proportional relationship has been shown time and again, at least for firms above a certain size who do at least some R&D.6 To illustrate this point, the figure below shows the relationship between firm sales and R&D expenses among publicly traded firms who report doing some R&D. The data is from Compustat (a database of publicly listed firms) and each dot represents 750 firm-by-year observations. In this graph, we control for year and fine sector (SIC4) so that the variation we isolate is across firms, within a year and within a sector.7 The slope is strikingly close to 1 on a log-log plot, meaning that the typical publicly listed firm increases its R&D expenditures by 10% when its size increases by 10%. Firm R&D expenditures by firm sales (log plot) Notes: Graph generated by Arnaud Dyèvre, with data on US publicly listed firms from Compustat. The sample of firms only include firms who report some R&D expenditures in a year. Sales and R&D are deflated using the Bureau of Labor Statistics CPI. This finding was first observed in the 1960s and has been reproduced across many studies since. In the figure below, from a seminal 1982 study by Bound, Cummins, Griliches, Hall and Jaffe, the authors have plotted log R&D expenditures of a panel of 2,600 manufacturing firms, as a function of their log sales, in 1976. The same proportional relationship is observed. Firm R&D expenditures by firm sales (log plot). Data from Bound, Cummins, Griliches, Hall and Jaffe (1982) The 1-to-1 proportionality of R&D to sales may lead one to conclude that the immense heterogeneity in firm sizes does not matter for the aggregate level of innovation. After all, if R&D scales proportionately with firm size, then an economy consisting of 10 firms with $1 billion in sales each will spend as much on R&D as an economy consisting of one firm with $10 billion in sales. But as we’ll see, this conclusion would be erroneous. Fact 2: Larger firms get fewer inventions per R&D dollar A variety of different lines of evidence show that firms get fewer inventions per R&D dollar as they grow. Let’s start with patents  (we’ll talk about non-patent evidence in a minute). The 1982 study by Bound, Cummins, Griliches, Hall and Jaffe mentioned earlier found that firms with larger R&D programs get fewer patents per dollar of R&D. Their result is summarized in Figure 3 (panel A) below; it shows an exponential decrease in the number of patents per R&D dollar as one moves up the size of firm’s log R&D expenditure. In a more recent and more comprehensive exploration of this relationship, Akcigit & Kerr (2018) use the universe of firms in the US matched to patents to document that patents per employee also decrease exponentially as a function of log employment (panel B). The relationships shown in the figures are very similar and suggest that bigger firms are getting fewer patents per productive unit—employment or R&D dollar.  Left: Patents per dollar of R&D as a function of total R&D expenditures (x-scale in log). From Bound, Cummins, Griliches, Hall & Jaffe (1982).Right: Patents per employees as a function of total employee count (x-scale in log). From Akcigit & Kerr (2018). Patents are not synonymous with invention though. It could, for example, be that as firms grow larger they create just as many inventions per R&D dollar, but they become less likely to use patents to protect their work. But in fact, the opposite seems to be true. Mezzanotti and Simcoe (2022) report on the Business R&D and Innovation survey, which was conducted between 2008 and 2015 by the US Census Bureau and the National Science Foundation. This survey asked more than 40,000 US firms, from a nationally representative sample, about their use of intellectual property. They find larger firms are much more likely to rate patents as important. For example, 69% of firms with more than $1bn in annual sales rate patents as somewhat or very important, compared to just 24% of firms with annual sales below $10mn. This relationship also holds when you compare responses across firms belonging to the same sector, in the same year. In other words, if we had a perfect measure of innovation that is not affected by selection like patenting is, we would find an even stronger negative relationship between firm size and patent per R&D dollar or per employee. Small firms have more patents per employee or R&D dollar, in spite of being less likely to file patents than big firms. Other empirical studies of innovations have relied on different measures of innovative output and have reached a similar conclusion. In a creative 2006 study of the financial service industry, Josh Lerner uses news articles from the Wall Street Journal to identify new products and services introduced by financial institutions. For example, if a story about a new security or the first online banking platform is written in the WSJ, Lerner counts it as an innovation and attribute it to a bank in the Compustat database. Consistent with papers using patent data, he finds that innovation intensity scales less than proportionately with firm size. (Note that Lerner measures size as the log of assets here rather than log sales, due to the nature of the industry studied.) You can also look for the introduction of innovations in other places. In 1982, the US Small Business Administration created a database of new products, processes or services in 100 technology, engineering or trade journals, and linked these inventions to firms. In their 1987 paper using this data, Acs & Audretsch also find that larger firms have fewer innovations per employees and fewer innovations per dollar of sales than small firms. (Though they emphasize that this isn’t universal; in some industries, large firms produce more innovations per dollar than small firms - but this isn’t typical) Finally, Argente et al. (2023) use product-scanner data in the consumer goods sector over 2006-2015 to obtain details on every product sold in a large sample of grocery, drug, and general-merchandise stores, including the associated firm that markets the product. Here, they identify innovation as the introduction of a new product; as the figure below illustrates, bigger firms consistently introduce fewer new products, relative to the number of products they already sell (gray line below). From Argente et al. (2023) Of course, not all new products are equally innovative. To deal with this issue, Argente and coauthors use data on the attributes of each product. Since they know the price and sales of each product, they can run statistical models to estimate a dollar value consumers put on different product attributes. They can then “quality adjust” new product introductions by the introduction of products that include new attributes, where attributes are given more weight if associated with higher prices (or sales). This more sophisticated approach yields the same result: when you adjust for quality, you still find that larger firms are less innovative (relative to their size) than small...
The Size of Firms and the Nature of Innovation
When Technology Goes Bad
When Technology Goes Bad
Like the rest of New Things Under the Sun, this article will be updated as the state of the academic literature evolves; you can read the latest version here. You can listen to this post above, or via most podcast apps here. Innovation has, historically, been pretty good for humanity. Economists view long-run progress in material living standards as primarily resulting from improving technology, which, in turn, emerges from the processes of innovation. Material living standards aren’t everything, but I think you can make a pretty good case that they tend to enable human flourishing better than feasible alternatives (this post from Jason Crawford reflects my views pretty well). In general, the return on R&D has been very good, and most of the attention on this website is viewed through a lens of how to get more of it. But technology is just a tool, and tools can be used for good or evil purposes. So far, technology has skewed towards “good” rather than evil but there are some reasons to worry things may differ in the future. Subscribe now Why is technology good for us, on average? I think technological progress has skewed good through most history for a few reasons. First, invention takes work, and people don’t do work unless they expect to benefit. The primary ways you can benefit from invention are either directly, by using your new invention yourself, or indirectly, by trading the technology for something else. To benefit from trade, you need to find technologies that others want, and so generally people invent technologies they think will benefit people (themselves or others), rather than harm them. Second, invention is a lot of work, and that makes it harder to develop technology whose primary purpose is to harm others. Frontier technological and scientific research is conducted by ever larger teams of specialists, and overall pushing the scientific or technological envelope seems to be getting harder. The upshot of all this is technological progress increasingly requires the cooperation of many highly skilled individuals. This makes it hard for people who want to invent technologies that harm others (even while benefitting themselves). While people who are trying to invent technologies to benefit mankind can openly seek collaborators and communicate what they are working on, those working on technologies to harm or oppress must do so clandestinely or be stopped. Third and finally, the technological capabilities of the people trying to stop bad technology from being developed grow with the march of technological progress. Think of surveillance technology in all its forms: wiretaps, satellite surveillance, wastewater monitoring for novel pathogens, and so on. Since it’s easier to develop technologies for beneficial use when you can be open about your work, then that will tend to boost the powers of those empowered to represent the common interest. In a democracy, that process will tend to hand more powerful tools to the people trying to stop the development of harmful technologies. Now - these tendencies have never been strong enough to guarantee technology is always good. Far from it. Sometimes technologies have unappreciated negative effects: think carbon emitting fossil fuels. Other times, large organizations successfully collaborate in secret to develop harmful technology: think military research. In other cases, authoritarian organizations use technological power to oppress. But on the whole, I think these biases have mitigated much of the worst that technology could do to us. But I worry a new technology - artificial intelligence - risks upending these dynamics. Most stories about the risks of AI revolve around AI’s developing goals that are not aligned with human flourishing; such a technology might have no hesitation creating technologies that hurt us. But I don’t think we even need to posit the existence of AI’s with unaligned goals of their own to be a bit concerned. Simply imagine a smart, moderately wealthy, but highly disturbed individual teaming up with a large language model trained on the entire scientific corpus, working together to develop potent bioweapons. More generally, artificial intelligence could make frontier science and technology much easier, making it accessible to small groups, or even individuals without highly specialized skills. That would mean the historic skew of new science and technology being used for good rather than evil would be weakened.1 What does science and technology policy look like in a world where we can no longer assume that more innovation generally leads to more human flourishing? It’s hard to say too much about such an abstract question, but a number of economic growth models have grappled with this idea. Don’t Stop Till You Get Enough Jones (2016) and Jones (2023) both consider the question of the desirability of technological progress in a world where progress can sometimes get you killed. In each paper, Jones sets up a simple model where people enjoy two different things; having stuff and being alive. Throughout this post, you can think of “stuff” as meaning all the goods and services we produce for each other; socks and shoes, but also prestige television and poetry. So let’s assume we have a choice: innovate or not. If we innovate, we increase our pile of stuff by some constant proportion (for example, GDP per capita tends to go up by about 2% per year), but we face some small probability we invent something that kills us. What do we do? As Jones shows, it all depends on the tradeoff between stuff and being alive. As is common in economics, he assumes there is some kind of “all-things-considered” measure of human preferences called “utility” which you can think of as comprising happiness, meaning, satisfaction, flourishing, etc. - all the stuff that ultimately makes life worth living. Most models of human decision-making assume that our utility increases by less-and-less as we get more-and-more stuff. If this effect is very strong, so that we very quickly get tired of having more stuff, then Jones (2016) shows we eventually hit a point where the innovation-safety tradeoff is no longer worth it. At some point we get rich enough that we choose to shut down growth, rather than risk losing everything we have on a little bit more. On the other hand, if the tendency for more stuff to increase utility by less-and-less is weak, then we may always choose to roll the dice for a little bit more. As a concrete illustration (not meant to be a forecast), Jones (2023) imagines a scenario where using artificial intelligence can increase annual GDP per capita growth from 2% per year to 10% per year, but with an annual 1% risk that it kills us all. Jones considers two different models of human preferences. In one of them, increasing our stuff by a given proportion (say, doubling it), always increases our utility by the same amount. If that is how humans balance the tradeoff between stuff and being alive, it implies we would actually take big gambles with our lives for more stuff. Jones’ model implies we would let AI run for 40 years, which would increase your income more than 50-fold, but the AI would kill us all with 1/3 probability! On the other hand, he also considers a model where there is some maximum feasible utility for humans; with more-and-more stuff, we get closer-and-closer to this theoretical maximum, but can never quite reach it. That implies increasing our pile of stuff by a constant proportion increases utility by less and less. If that is how humans balance the tradeoff between having stuff and being alive, we’re much more cautious. Jones’ model implies in this setting we would let AI operate for just 4-5 years. That would increase our income by about 50%, and the AI would kill us all with “just” 4% probability. But after our income grows by 50%, we would be in a position where a 10% increase in our stuff wouldn’t be worth a 1% chance that we lose it all. Different Kinds of Progress The common result is that, as we get sufficiently rich, we are increasingly willing to sacrifice economic growth in exchange for reduced risks to our lives. That’s a good place to start, but it’s a bit too blunt an instrument: we actually have more options available than merely “full steam ahead” and “stop!” A variety of papers - including Jones (2016) - take a more nuanced approach and imagine there are two kinds of technology. The first is as described above: it increases our stuff, but doesn’t help (and may hurts) our health. The second is a “safety” technology: it doesn’t increase our stuff, but it does increase our probability of survival. “Safety” technology is a big category. Plausible technologies in this category could include: Life-saving medical technology Seatbelts and parachutes Renewable energy Carbon capture and removal technology Crimefighting technology Organizational innovations that reduce the prospects of inadvertent nuclear first strikes AI alignment research And many others. The common denominator is that safety technologies reduce dangers to us as individuals, or as a species, but generate less economic growth than normal technologies. In addition to the model discussed above, Jones (2016) builds a second model where scientists face a choice about what kind of technologies to work on. The model starts with a standard model of economic growth, where technological progress does not tend to increase your risk of dying (whew!). But we still do die in this model and Jones assumes people can reduce their probability of dying by purchasing safety technologies. Scientists and inventors, in turn, can choose to work on “normal” technology that makes people richer, or safety technology, which makes them live longer. There’s a market for each. This gives you a result similar in spirit to the one discussed above: as people get richer, the tradeoff between stuff and survival starts to tilt increasingly towards survival. If peopl...
When Technology Goes Bad
Can taste beat peer review?
Can taste beat peer review?
Like the rest of New Things Under the Sun, this article will be updated as the state of the academic literature evolves; you can read the latest version here. You can listen to this post above, or via most podcast apps here. Note: Have an idea for a research project about how to improve our scientific institutions? Consider applying for a grant of up to $10,000 from the Metascience Challenge on experiment.com, led by Paul Niehaus, Caleb Watney, and Heidi Williams. From their call for proposals: We're open to a broad set of proposals to improve science -- for example, experimental designs, surveys, qualitative interviews with scientists, pilot programs for new mechanisms, scientific talent development strategies, and other research outputs that may be relevant for scientific research funders. The deadline to apply is April 30. On to our regularly scheduled programming! Scientific peer review is widely used as a way to distribute scarce resources in academic science, whether those are scarce research dollars or scarce journal pages.1 Peer review is, on average, predictive of the eventual scientific impactof research proposals and journal articles, though not super strongly. In some sense, that’s quite unsurprising; most of our measures of scientific impact are, to some degree, about how the scientific community perceives the merit of your work: do they want to let it into a journal? Do they want to cite it? It’s not surprising that polling a few people from a given community is mildly predictive of that community’s views. At the same time, peer review has several potential short-comings: Multiple people reading and commenting on the same document costs more than having just one person do it Current peer review practices provides little incentives to do a great job at peer review Peer review may lead to biases against riskier proposals One alternative is to empower individuals to make decisions about how to allocate scientific resources. Indeed, we do this with journal editors and grant makers, though generally in consultation with peer review. Under what conditions might we expect individuals empowered to exercise independent judgement to outperform peer review? To begin, while peer review does seem to add value, it doesn’t seem to add a ton of value; at the NIH, top-scoring proposals aren’t that much better than average, in terms of their eventual probability of leading to a hit (see this for more discussion). Maybe individuals selected for their scientific taste can do better, in the same way some people seem to have an unusual knack for forecasting. Second, peer reviewers are only really accountable for their recommendations insofar as it affects their professional reputations. And often they are anonymous, except to a journal editor or program manager. That doesn’t lead to strong incentives to try and really pin down the likely scientific contribution of a proposal or article. To the extent it is possible to make better judgments by exerting more effort, we might expect better decision-making from people who have more of their professional reputation on the line, such as editors and grant-makers. Third, the very process of peer review may lead to risk aversion. Individual judgment, relying on a different process, may be able to avoid these pitfalls, at least if taking risks is aligned with professional incentives. Alternatively, it could be that a tolerance for risk is a rare trait in individuals, so that most peer reviewers are risk averse. If so, a grant-maker or journal that wants to encourage risk could do so by seeking out (rare) risk-loving individuals, and putting them in decision-making roles. Lastly, another feature of peer review is that most proposals or papers are evaluated independently of each other. But it may make sense for a grant-maker or journal to adopt a broader, portfolio-based strategy for selecting science, sometimes elevating projects with lower scores if they fit into a broader strategy. For example, maybe a grant-maker would want to support in parallel a variety of distinct approaches to a problem, to maximize the chances at least one will succeed. Or maybe they will want to fund mutually synergistic scientific projects. We have a bit of evidence that empowered individual decision-makers can indeed offer some of these advantages (often in consultation with peer review). Subscribe now Picking Winners Before Research To start, Wagner and Alexander (2013) is an evaluation of the NSF’s Small Grants for Exploratory Research programme. This program, which ran from 1990-2006, allowed NSF programme managers to bypass peer review and award small short-term grants (up to $200,000 over 2 years).2 Proposals were short (just a few pages), made in consultation with the programme manager (but not other external review), and processed fast. The idea was to provide a way for programme managers to fund risky and speculative projects that might not have made it through normal peer review. Over its 16 years, the SGER (or “sugar”) program disbursed $284mn via nearly 5,000 awards. Wagner and Alexander argue the SGER program was a big success. By the time of their study, about two thirds of SGER recipients had used their results to apply for larger grant funding from the conventional NSF programs, and of those that applied 80% were successful (at least, among those who had received a decision). They also specifically identify a number of “spectacular” successes, where SGER providing seed funding for highly transformative research (judged as such from a survey of SGER awardees and programme managers, coupled with citation analysis). Indeed, Wagner and Alexander’s main critique of the programme is that it was insufficiently used. Up to 5% of agency funds could be allocated to the program, but a 2001 study found only 0.6% of the budget actually was. Wagner and Alexander also argue that, by their criteria, around 10% of funded projects were associated with transformational research, whereas a 2007 report by the NSF suggests research should be transformational about 3% of the time. That suggests perhaps program managers were not taking enough risks with the program. Moreover, in a survey of awardees, 25% said an ‘extremely important’ reason for pursuing an SGER grant was that their proposed research idea would be seen as either too high-risk, too novel, too controversial, or too opposed to the status quo for a peer review panel. That’s a large fraction, but it’s not a majority (the paper doesn’t report the share who rate these factors as important but not extremely important though). Again, maybe the high-risk programme is not taking enough risks! In general though, the SGER programme’s experience seems to support the idea that individual decision-makers can do a decent job supporting less conventional research. Goldstein and Kearney (2018) is another look at how well discretion compares to peer review, this time in the context of the Advanced Research Projects Agency - Energy (ARPA-E). ARPA-E does not function like a traditional scientific grant-maker, where most of the money is handed out to scientists who independently propose projects for broadly defined research priorities. Instead, ARPA-E is composed of program managers who are goal oriented, seeking to fund research projects in the service of overcoming specific technological challenges. Proposals are solicited and scored by peer reviewers along several criteria, on a five-point scale. But program managers are very autonomous and do not simply defer to peer review; instead, they decide what to fund in terms of how proposals fit into their overall vision. Indeed, in interviews conducted by Goldstein and Kearney, program managers report that they explicitly think of their funded proposals as constituting a portfolio, and will often fund diverse projects (to better insure at least one approach succeeds), rather than merely the highest scoring proposals. From Goldstein and Kearney (2018) Goldstein and Kearney have data on 1,216 proposals made up through the end of 2015. They want to see what kinds of projects program managers select, and in particular, how they use their peer review feedback. Overall, they find proposals with higher average peer review scores are more likely to get funded, but the effects are pretty weak, explaining about 13% of the variation in what gets funded. The figure above shows the average peer review scores for 74 different proposals to the “Batteries for Electrical Energy Storage in Transportation” program: filled in circles were funded. As you can see, program managers picked many projects outside the top. From Goldstein and Kearney (2018) What do ARPA-E program managers look at, besides the average peer review score? Goldstein and Kearney argue that they are very open to proposals with highly divergent scores, so long as at least one of the peer review reports is very good. Above, we have the same proposals to the Batteries program listed above, but instead of ordering them by their average peer review score, now we’re ordering them by their maximum peer review score. Now we’re seeing more proposals getting funded that are clustered around the highest score. This is true beyond the battery program: across all 1,216 project proposals, for a given average score, the probability of being funded is higher if the proposal receives a wider range of peer review scores. Goldstein and Kearney also find proposals are more likely to be funded if they are described as “creative” by peer reviewers, even after taking into account the average peer review score. ARPA-E was first funded in 2009, and this study took place in 2018, using proposals made up through 2015. So there hasn’t been a ton of time to assess how well the program has worked. But Goldstein and Kearney do an initial analysis to see how well projects turn out when program managers use their discretion to override peer review. To do this, they divid...
Can taste beat peer review?
What does peer review know?
What does peer review know?
Like the rest of New Things Under the Sun, this article will be updated as the state of the academic literature evolves; you can read the latest version here. You can listen to this post above, or via most podcast apps here. People rag on peer review a lot (including, occasionally, New Things Under the Sun). Yet it remains one of the most common ways to allocate scientific resources, whether those be R&D dollars or slots in journals. Is this all a mistake? Or does peer review help in its purported goal to identify the science most likely to have an impact and hence, perhaps most deserving of some of those limited scientific resources? A simple way to check is to compare peer review scores to other metrics of subsequent scientific impact; does peer review predict eventual impact? A number of studies find it does. Subscribe now Peer Review at the NIH Let’s start with peer review at the stage of reviewing research proposals. Li and Agha (2015) looks at more than 100,000 research projects funded by the NIH over 1980-2008, comparing the percentile rank of the application peer review scores to the outcomes of these research projects down the road. For each grant, they look for publications (and patents) that acknowledge the grant’s support. Besides counting the number of publications and patents each grant results in, they can also see how often the publications are cited. Note, they are only looking at projects that actually were funded by the NIH, so we don’t need to worry that their results are just picking up differences between funded and unfunded projects. The upshot is, better peer review scores are correlated with more impact, whether you want to measure that as the number of resulting journal articles, patents, or citations. For example, here’s a scatter plot of the raw data, comparing peer review percentile ranks (lower is better) to citations and publications. Lots of noise, but among funded projects, if people think your proposal is stronger, you’re more likely to get publications and citations. From Li and Agha (2015) Li and Agha also look at the correlation between peer review scores and impact measures after controlling for other potentially relevant factors, such as the year or field of the grant, or the PI’s publication history, institution, and career characteristics. The results are moderated a bit, but basically still stand - compare two grants in the same year, in the same study section, from PIs who look pretty similar on paper, and the grant with higher peer review scores will tend to produce more papers, patents, receive more citations, and produce more very highly cited papers. Among funded proposals, the predictive power of peer review seems to be highest at the top; the difference in citations, for example, between a top-scoring proposal and one at the 20th percentile tends to be much larger than the difference in citations between one at the 20th and 40th percentile.1 Moreover, even at the top, the correlation between peer review scores and outcomes isn’t great. If you compare proposals that score at the top to proposals at the 10th percentile (of grants that were ultimately still funded), the top proposal is twice as likely to result in a one-in-a-thousand top cited paper. I think that’s not actually that high - since a 10th percentile proposal isn’t that far off from the average, if peer review was really accurate, you might have expected the top proposal to be something like ten times as likely to produce a hit paper than as an average proposal. Park, Lee, and Kim (2015) exploits a peculiar moment in NIH history to provide further evidence that the NIH peer review processes, on average, pick projects with higher scientific impact. In 2009, the US government passed the American Recovery and Reinvestment Act, a stimulus bill meant to fight the economic headwinds of the 2008 financial crisis. The bill authorized $831bn in new spending, of which a tiny corner, $1.7bn, was used by the NIH to fund research projects that would not normally have been funded. This provides a rare opportunity to see how good projects that would otherwise have been rejected by the NIH (which relies heavily on peer review to select projects) fare when they unexpectedly receive funding. When Park, Lee, and Kim (2015) compare stimulus-funded proposals (which got lower peer review scores) to normally funded proposals, they find the stimulus-funded proposals tend to lead to fewer publications and that these publications tended to receive fewer citations. On average, a research proposal with peer review scores high enough to be funded under the NIH’s normal budget produces 13% more publications than a stimulus funded project. If we focus on a proposal’s most high-impact publication (in terms of citations), Park and coauthors find proposals funded only because of the stimulus got 7% fewer citations. Lastly, we can look at the 5% of publications funded by these NIH grants that received the highest amount of citations. A normally funded research proposal had a 7% chance of producing one of these “highest impact” papers; a stimulus-funded proposal had a 4% chance of producing one. I think these results are pretty consistent with Li and Agha (2015) in a few ways. They replicate the general finding that in the NIH, higher peer review scores are associated with more research impact (as measured with imperfect quantitative methods). But they also find peer review doesn’t have super forecasting acumen. Note that Park, Lee, and Kim are not comparing proposals that just barely clear the NIH’s normal funding threshold to proposals that just barely miss it - they don’t have the data needed for that. Instead, they are comparing the entire batch of proposals rated above the NIH’s normal funding threshold to a batch of proposals that fall uniformly below it. The batch of normally funded proposals includes the ones that were rated very highly by peer review, which Li and Agha’s work suggests is where peer review tends to work best. Even so, the differences Park, Lee, and Kim find aren’t enormous. Peer Review at Journals We have some similar results about the correlation between peer review scores and citations at the publication stage too. As discussed in more detail in Do academic citations measure the impact of new ideas? Card and DellaVigna (2020) have data on about 30,000 submissions to four top economics journals, including data on their peer review scores over (roughly) 2004-2013. Because, in economics, it is quite common for draft versions of papers to be posted in advance of publication, Card and Dellavigna can see what happens to papers that are accepted or rejected from these journals, including how many citations they go on to receive (both as drafts and published versions). As with Li and Agha (2015), they find there is indeed a positive correlation between the recommendation of reviewers and the probability a paper is among the top 2% most highly cited in the journal. From Card and Dellavigna (2020) Neither is this because high peer review scores lead to publication in top economics journals (though that’s also true). Card and Dellavigna also track the fate of rejected articles and find that even among rejects to these journals, those that get higher peer review scores still go on to receive more citations. Siler, Lee, and Bero (2014) obtain similar results using a smaller sample of submissions to the Annals of Internal Medicine, the British Medical Journal, and The Lancet over 2003 and 2004. For a sample of 139 submissions that received at least two peer review scores, they can track down the eventual fate of the submission (either published in one of these three journals or another). Among the 89 peer-reviewed submissions that were ultimately rejected, the peer review scores (from the first, initial review) were positively correlated with the number of citations the submissions eventually received, though the correlation was pretty weak. For the 40 submissions that were reviewed and accepted, again positive (initial) peer review reports were positively correlated with the number of citations eventually received. In this latter case, the correlation was too weak to be confident it’s not just noise (possible because the sample was so small). Siler, Lee, and Bero also emphasize that the three journals actually rejected the 14 papers that would go on to receive the most citations (though they did manage to get the 15th!). From Siler, Lee, and Bero (2014) Perhaps more reassuring is the fact that generally speaking, papers that went on to be highly cited tended to be identified as publishable in other papers pretty quickly. The figure below compares the eventual number of citations received to the time elapsed between submission to one of the three journals under study and eventual publication somewhere else. No highly cited papers took longer than 500 days (not great, but better than 2000!) to find a home. That could be because peer review at one of the next journals the paper was submitted to was quick to recognize the quality of these articles, or possibly that they rapidly resubmitted after getting favorable feedback from initial peer reviewers. But this evidence is pretty indirect and other explanations are also possible (for example, maybe the authors believed in the paper’s merit and submitted them more frequently for review, or they were more frequently desk-rejected and so could be resubmitted fast). From Siler, Lee, and Bero (2014) That said, we also have one more study looking at peer review reports and eventual impact, this time in the American Sociological Review. Teplitskiy and Bakanic (2016) have data on 167 articles published in the American Sociological Review in the 1970s, as well as their peer review scores. Among this set of published article, they find no statistically significant relationship between peer review scores and the number of citations papers go on to earn. After a...
What does peer review know?
Biases Against Risky Research
Biases Against Risky Research
Like the rest of New Things Under the Sun, this article will be updated as the state of the academic literature evolves; you can read the latest version here. You can listen to this post above, or via most podcast apps here. Recommendation: My Open Philanthropy colleague, Ajeya Cotra has teamed up with Kelsey Piper at Vox to launch a newsletter about “a possible future in which AI is functionally making all the most important decisions in our economy and society.” I would have put the newsletter on my substack recommendations, but it’s not on substack, so I’m plugging it here. If you are thinking about AI these days - and who isn’t? - check it out! A frequent worry is that our scientific institutions are risk-averse and shy away from funding transformative research projects that are high risk, in favor of relatively safe and incremental science. Why might that be? Let’s start with the assumption that high-risk, high-reward research proposals are polarizing: some people love them, some hate them. It’s not actually clear this is true,1 but it seems plausible and for the purposes of this post I’m just going to take it as given. If this is true, and if our scientific institutions pay closer attention to bad reviews than good reviews, then that could be a driver of risk aversion. Let’s look at three channels through which negative assessments may have outsized weight in decision-making, and how this might bias science away from transformative research. Subscribe now Reviewer Preferences Let’s start with individual reviewers: how does the typical scientist feel about riskier research? As far as I know, we don’t have good data that’s directly on how academic peer reviewers feel about high-risk / high-reward research proposals. There is some work on how academic scientists treat novelty at the publication stage, but there might be some big differences between how risky research is judged at the proposal versus the publication stage (an argument developed in more detail in Gross and Bergstrom 2021). For one, after the research is done, you can often see if the risk paid off! In this post I’m going to focus on work looking at research proposals and to learn about the preferences of peer reviewers, I’m going to look at Krieger and Nanda (2022), which provides some granular information about how working scientists in industry think about which kinds of pharmaceutical research projects to fund. Krieger and Nanda study an internal startup program at the giant pharmaceutical company, Novartis. The program was meant to identify and rapidly fund “transformative, breakthrough innovation” developed by teams of scientists working within Novartis. Over 150 Novartis teams submitted applications for the funding, and these were screened down to a shortlist of 12 who pitched their proposal to a selection committee. These pitches were made over video chat, due to covid-19, which meant they could be viewed by lots of people at once. About 60 additional Novartis research scientists watched some or all of the pitches and Krieger and Nanda got them to score each research proposal on a variety of criteria, and then to allocate hypothetical money to the different proposals. What’s particularly interesting for us is that we can see how scientists rated different aspects of a proposal, and how that relates to their ultimate decision about what to (hypothetically) fund. Participants in the study rated each proposal on: Transformative potential (more creative, non-standard is better) Breadth of applicability (more and higher value propositions) Timescale to first prototype (within 18 months is better) Feasibility/path to execution (more feasible is better) Team (does the team have the skill and network to achieve the goal) These different scores were aggregated into a weighted average that put extra weight on feasibility and the team, but put the most weight on a proposal’s transformative potential. (After all, that’s what the program was set up to fund.) Next, the study participants are asked how much money from a hypothetical budget to allocate to different projects. Note, when they’re doing this allocation, they can clearly see the weighted average of the scores they gave on each criteria, so it is obvious which proposals are supposed to get funding, if you strictly follow the scoring formula that Novartis devised. No surprise, Krieger and Nanda find that proposals with a higher score tend to get more hypothetical funding. But they also find, all else equal, reviewers penalize projects that have greater variation among the different criteria. That is, when comparing two projects with the same weighted average, study participants give more money to a project if it most of its criteria are close to the overall weighted average and less money if some criteria are well above the average and some well below. That implies negative attributes of a project “count” for more in the minds of reviewers. Even if bad scores on some criteria are counterbalanced by higher scores on others, these kinds of projects still get less (hypothetical) funding than less uneven proposals. But we can be even more precise. This bias against proposals with low scores on some dimensions and high scores on others is mostly driven by a particular type of divergence: proposals rated as having a high transformative potential but low feasibility tend to be the most penalized. That’s consistent with peer reviewers themselves being a source of bias against novel projects. They can recognize a project is high-risk and high-reward, but when asked which projects to give research funding too, they shy away from them in favor of lower-risk but lower-reward projects. Note though, that this data is from industry scientists, and maybe they are different in their risk preferences than their academic peers. S0 interpret with caution. Let’s next turn to some studies specifically about academia. Random Averages The previous section was about possible biases among individual reviewers. But most of the time, research proposals are evaluated by multiple reviewers, and then the scores across reviewers are averaged. And that system can introduce different problems. One way that averaging across reviewers leads to sensitivity to negative reviews is the fact that money for science tends to be tight, which means only research proposals that receive high average scores tend to be funded. If a single negative review can pull your score below this funding threshold, then negative reviews may exert excessive influence. For example, proposals submitted to the UK’s Economic and Social Research Council (ESRC) are typically scored by 3-4 reviewers on a 6-point scale, and usually only proposals that receive average scores above 4.5 make it to the stage where a panel deliberates on which proposals to fund. Jerrim and de Vries (2020) look at over 4,000 ESRC research proposals made over 2013-2019 and find 81% of proposals with an average score of 5.75-6 from the peer reviewers get funded, but only 24% of proposals with an average score of 4.5-5. That is to say, if you have three reviewers who love a proposal and rate it a maximum 6/6, it’ll be funded 81% of the time, but if you add one more reviewer who hates it and gives it a 1/6, then the average of 4.75 implies it only has a 24% chance of being funded. Of course, maybe that’s a feature, not a bug, if negative reviews actually do spot serious weaknesses. But before getting into that, we might first ask if this scenario is actually plausible in the first place: could it really be the case that three people rate a project 6/6 and another rates it 1/6? If three people think a project is outstanding, isn't it pretty unlikely that a fourth person would think it’s actually poor? This gets into the question of how consistent are peer review scores with each other, which is itself a large literature. But at least for their sample of ESRC proposals, Jerrim and de Vries find inter-reviewer correlations are very weak. Any particular reviewer’s score is only a tiny bit predictive of their peers score. That means a score of 1/6 is less likely when three other reviewers rate it 6/6 - but not that much less likely than random (though on average only 4% of reviewers give proposals a score of 1/6). So it is true that one really bad review can substantially reduce the probability of getting funded. But that doesn’t necessarily mean the system isn’t working exactly as it should; perhaps the bad review noticed serious flaws in the proposal that the other reviewers missed? Even so, there are two reasons that this seemingly innocuous procedure (get expert feedback and average it) can lead to excessive risk aversion for a funder. First, scores are asymmetrically distributed. In Jerrim and de Vries’ data, the average score is 4.4, and more than half of reviews are a 5 or 6. If you believe a proposal is really bad it’s feasible to strongly signal your dislike by giving it a score of 1, which is 3.4 below the average. But if you really love a proposal, it’s hard to signal that with your scoring: the best you can do is give it a 6, which is just 1.6 above the average. When you average out people who really love and really hate a project, the haters have more leverage over the final score.2 Second, low levels of inter-reviewer correlation imply there’s a lot of randomness in the reviewing process. That could be bad for transformative research proposals, if they are weirder and end up getting more reviews. For example, a proposal that combines ideas from disparate sources might need more reviewers to adequately vet the proposal, since it would need to pull in multiple reviewers to vet each of the idea’s sources. That could be a problem because, in general, there will be more variation in the average scores of proposals that receive fewer reviewers. For example, in Jerrim and de Vries’ data, on average about 25% of reviewers rate proposals as 6/6. If you h...
Biases Against Risky Research
February 2023 Updates
February 2023 Updates
New Things Under the Sun is a living literature review; as the state of the academic literature evolves, so do we. This post highlights some recent updates. Subscribe now Local Learning Last year I wrote a post called Remote Breakthroughs about the changing nature of innovation among remote collaborators. Part of that post discussed evidence that local interactions do a better job of exposing us to new ideas than remote interactions. I’ve now spun that discussion out into it’s own expanded article, specifically on that topic. This was mostly prompted by a new paper, van der Wouden and Youn (2023). Here’s an excerpt from that new article, titled Local Learning: In my experience, the internet can’t be beat for encountering a diversity of ideas. But often, that encounter is at a pretty surface level. You read a tweet; a headline; a blog post synthesizing some studies, etc. Nothing wrong with surface level engagement - you can’t engage in everything deeply. But pushing the innovation frontier increasingly requires deep engagement with at least some domain of knowledge. And there are reasons to think that offline/in-person interaction might be better for forging that kind of deep engagement with new ideas. To start, let’s look at van der Wouden and Youn (2023), which wants to see if in-person collaboration on academic projects more reliably leads to the transfer of knowledge between coauthors than remote collaboration. To answer that question, the authors gather data on 1.7mn academics who, at some point over the period 1975-2015, produce a sequence of three papers that exhibit a very specific pattern. In reverse order, they need: The last paper in the sequence to be solo-authored The second-to-last paper to be coauthored with at least one other author At least one more prior paper. They’re going to pull all that information from the Microsoft Academic Graph. Next, they want an estimate of what knowledge domains the academic is fluent enough in to publish an original research paper in. To get those, they leverage the 292 subdisciplines that the Microsoft Academic Graph tags papers with. By looking at the subdisciplines tagged to your work, they can get an idea about what you are an expert in, and also how your areas of expertise grow over time. Moreover, by focusing specifically on solo-authored work, they can be most sure that it’s really you who is the expert, and not one of your coauthors. The main idea of the paper is to figure out an academic’s areas of expertise based on all papers they’ve published, up to and including the first one in the sequence of three alluded to above. Next, they look to see if the second paper in the above sequence was conducted with local or remote collaborators. Finally, they look at the final paper in the sequence, which was solo-authored, and see if it is tagged with any new subdisciplines, relative to all your papers up-to-and-including the first one in the sequence. If so, they take that as evidence that the author gained expertise in a new subject in between the first and third paper, possibly via their interaction with their collaborators on the second paper. Lastly, they can see if this “learning” effect is more common when you work with local or remote coauthors. In the following figure, we can see how the probability of writing a solo-authored paper tagged with a new subdiscipline changes when you work with increasingly distant colleagues on your previous paper. van der Wouden and Youn call this the “learning rate.” If your collaborators were local (under 700m away, a 10 minute walk), then about 7.5% of the time, your next paper is on something you haven’t written about before. If your collaborators are out of town, say more than 25km, the probability drops to more like 4.5%. From van der Wouden and Youn (2023) This pattern is consistent across fields, though stronger in some fields than others. For example, the relative probability of pivoting to a new topic after a local collaboration compared to a distant one is generally higher in STEM fields than in non-STEM fields. Moreover, while the figure above is raw data, you get a similar effects when you toss in a bunch of additional control variables: the number of coauthors, the career stage of the academic, the ranking of the institution they are affiliated with, and so on. The post then goes on to discuss another paper, Duede et al. (2022), which was originally part of the Remote Breakthroughs article. It closes with some discussion of how these trends have changed over time, and ends up arguing these results are consistent with a theme I’ve argued elsewhere: that proximity is good for meeting new people outside your usual professional context, but not so necessary for productive collaboration once these relationships are formed. Read "Local Learning" A Bit Less Local Learning The article Planes, Trains, Automobiles, and Innovation is about a similar theme: how changing technology affects the ability to collaborate over a distance. The article originally covered three studies, each about how the expansion of transit options - new air routes, new train routes, or more local roads - facilitated more remote collaboration among scientists and inventors. I’ve added to this article a discussion of Koh, Li, and Xu (2022), which looks at the expansion of the Beijing subway system: Koh, Li, and Xu (2022) studies the impact of the dramatic expansion of the Beijing subway on private sector innovation. The subway system in Beijing grew pretty slowly until the 2000s, when the pace of expansion dramatically ramped up ahead of the 2008 Summer Olympics in Beijing and as part of the government’s stimulus response to the 2007-2008 financial crisis. The number of subway stations went from 41 to 379 between 2000 and 2018, while the total track length grew from 54.1km to 655km over the same time frame. Koh, Li, and Xu cut Beijing up into 0.5km squares and look at what happens to the number of patents by distant collaborators residing in different 0.5km blocks. Across a lot of different approaches,1 they find a subway connection that reduces travel time between blocks by at least an hour leads to a 15-38% increase in patent applications filed. Change in number of patents between blocks after travel time is reduced by an hour or more. From Koh, Li, and Xu (2022) Now that this article also discusses subways, I could have changed the title to “Planes, Trains, Subways, Automobiles, and Innovation”, but since that is quite a mouthful I instead changed the title to Transportation and Innovation. Read "Transportation and Innovation" Long Distance Learning The article The “idea” of being an entrepreneur tries to argue that one important factor about whether people choose to become entrepreneurs or not is if they even conceive of entrepreneurship as an option. The piece argues this idea - that yes, even people like you can be an entrepreneur - is often spread by social contagion from people who are like ourselves but are also entrepreneurs. I’ve now added a new section to this article about the transmission of the “idea” of entrepreneurship via mass media. If transmitting the “idea” of entrepreneurship matters, then countries with mass media celebrating entrepreneurship might get more entrepreneurs, because people consuing this media diet are more likely to consider entrepreneurship a viable option. This is a tough hypothesis to test, since mass media tends to reflect the society it is targeting. In a society with lots of entrepreneurship and lots of mass media celebrating entrepreneurship, which caused which? Likely it’s a bit of both! Another reason it’s hard to test this hypothesis is because, ideally, you want to compare people exposed to one mass media diet to people exposed to another one, but who are otherwise identical. But most people have access to the same mass media (that’s what makes it mass!), and so if one group chooses not to consume it, it’s likely because they differ in some way. Slavtchev and Wyrwich (2023) identifies one peculiar instance in history that does permit testing this hypothesis. When Germany split into East and West, following the Second World War, most forms of entrepreneurship were banned in East Germany. From the 1960s on though, West Germany consciously crafted and broadcast TV programming into East Germany, as a matter of policy. Compared to East German television, West German television tended to celebrate individualism, business, entrepreneurship, and the like. This programming was popular, if you could get it: surveys indicate over 90% of people who could access the broadcasts tuned in at least several times per week. But not everyone could get it. A few regions that were far from the broadcast towers, or where signals were blocked by hills and mountains, could not easily access this programming, and surveys indicate many fewer people in these regions regularly watched West German programming: just 15% several times a week, and 68% never. Yet besides their geographic distance and different topography, the regions of East Germany with access to West German television don’t seem to have been much different from the (small number of) regions of East Germany without. Slavtchev and Wyrwich argue this is the kind of natural experiment we’re looking for: mass media promoting entrepreneurship in a society that is not already celebrating it (it was mostly outlawed!), and different levels of exposure to this mass media among groups that were otherwise similar. Lastly, after the collapse of the USSR, many forms of entrepreneurship became legal once again in East Germany, so Slavtchev and Wyrwich can actually see if this differential mass media exposure mattered: do parts of formerly East Germany with greater exposure to West German television end up with more entrepreneurship than those without?2 Yes. The figure below tracks the per capita number of new businesses and new self-employed indivi...
February 2023 Updates
Announcing a Site Index (and an AMA)
Announcing a Site Index (and an AMA)
Dear reader, As the number of articles I’ve written has grown, it’s become harder and harder to make sense of the New Things Under the Sun back catalog. That’s a shame, because I update those older posts as the literature evolves, to keep them close to the academic frontier. So today I am happy to announce the launch of a set of indices to help readers figure out what they might want to read on New Things Under the Sun. Just a picture, check out the real thing here There are nine different indices, each of which gathers together all the articles I’ve written related to a specific topic, such as “how science works (or doesn’t)”, “geography of innovation”, and “how innovation is changing.” Inside each index, I’ve listed all the relevant articles on the topic and written a 3-6 bullet point description of the article’s contents. My hope is this makes it easy to find what you want to find. Click the button below to check out the indices and see if there is an article on the site that you would have read but didn’t know existed! Visit the Site Index But what if you’ve looked and what you want isn’t there? In that case, my advice is to head over to the progress forum, where I am answering user-submitted questions for the next 48 hours, and ask away. Thanks everyone; back to your regularly scheduled programming next time! -Matt Subscribe now P.S. Special thanks to my unpaid semi-competent intern, chatGPT for assistance writing bullet point descriptions of New Things Under the Sun articles. You didn’t really make this project easy, but you did make it feasible. Keep at it kid and I know, someday, I won’t have to rewrite 50% of your work. P.P.S Happy Valentines Day everyone!
Announcing a Site Index (and an AMA)
Innovators Who Immigrate
Innovators Who Immigrate
Like the rest of New Things Under the Sun, this article will be updated as the state of the academic literature evolves; you can read the latest version here. You can listen to this post above, or via most podcast apps here. Talent is spread equally over the planet, but opportunity is not. Today I want to look at some papers that try to quantify the costs to science and innovation from barriers to immigration. Specifically, let’s look at a set of papers on what happens to individuals with the potential to innovate when they immigrate versus when they do not. (See my post Importing Knowledge for some discussion on the impact of immigration on native scientists and inventors) All of these papers confront the same fundamental challenge: successfully immigrating is (usually) a matter of choice, selection, and luck. For the purposes of investigating the impact of immigration on innovation, that means we can’t simply compare immigrants to non-immigrants. For example, immigrants (usually) choose to migrate, and if they do so because they believe they will be more successful abroad, that signals something about their underlying level of ambition and risk tolerance. That, in turn, might mean they are more likely to be innovative scientists or inventors, even if they had not migrated. Compounding this problem, countries impose all sorts of rules about who is allowed to migrate and many of these rules make it easier to migrate if you can demonstrate some kind of aptitude and talent. That means successful immigrants are often going to be drawn from a pool of people more likely to have the talent to succeed in science and invention, even if they had not immigrated. These are challenges; but there is also a degree of capricious luck in immigration (and life in general). There are people - perhaps many people - who want to immigrate and have extraordinary talent, but who do not for all sorts of random reasons. Compared to otherwise identical people who do migrate, they might lack information, financial resources, or face higher barriers to legal immigration. Indeed, in many cases, immigration is literally handed out by lottery! The papers we’ll look at employ various strategies to try and find comparable groups of people who immigrate and people who do not, to infer the impact of immigration and place on innovation. Subscribe now Talented High Schoolers One way to deal with the selection effect is to try and measure the talent of a sample of both immigrants and non-immigrants and then compare immigrants and non-immigrants who appear to have similar underlying talent. Agrawal and Gaule (2020) and Agrawal et al. (2023) does this with the International Mathematical Olympiads. The International Mathematical Olympiads is a prominent math competition for high school students from around the world that’s been held annually for decades. Up to six representatives from each country are selected via regional and national competitions, and then travel to a common city and try to solve six different (presumably very hard) math problems. Because it’s an Olympiad, winners take home gold, silver and bronze medals. Agrawal and coauthors know the scores of all the competitors from 1981 to 2000 and then look to see what happens to the competitors later in life. In Agrawal and Gaule (2020) they show that scores on these math competitions strongly predicts later success as a mathematician. That in itself is surprising, given that the talents for doing creative mathematical research may, in principle, differ substantially from performance in a competition. From Agrawal and Gaule (2020) Their dataset also establishes something else: students from low income countries are less likely to obtain PhDs in math than students with the same score from high-income countries. In Agrawal et al. (2023) they use this dataset to look at the different fates of those who immigrate from their home country and those who do not. On average, a migrant is about twice as likely to be employed in academia as a mathematician as someone from the same county who got the same math score but did not migrate. Of course, while math scores help address the problem of selection, this doesn’t really get at the problem of choice. Perhaps people who really want to be mathematicians are disproportionately likely to migrate, since the highest ranked mathematics departments tend to be in the USA, and it’s this difference in career intention that explains the difference in career outcomes between migrants and non-migrants. Agrawal et al. (2023) provides some additional evidence that this is not purely an outcome of career choice. For one, looking only at migrant and non-migrant students who both become math academics (in their own country or abroad), they find the migrants go on to garner about 85% more citations to their publications than their domestic peers (remember, with the same score in math competitions). We might think citations aren’t a great measure of math skill (see my post Do Academic Citations Measures the Value of Ideas?), but they also show migrant academics are about 70% more likely to become speakers at the International Congress of Mathematicians (a non-citation-based measure of community recognition). So among people who ended up becoming academic mathematicians (either at home or abroad), the ones who migrated went on to have more distinguished careers, as compared to their peers who did equally well in high school on math. But this is still pretty indirect evidence. Fortunately, Agrawal and coauthors also just asked Olympiad medalists directly about their preferences in a survey. From respondents in low- and middle-income countries, 66% said they would have liked to do their undergraduate degree in the USA if they could have studied anywhere. Only 25% actually did. Just 11% said their first choice was to study in their home country. In fact, 51% did. Why didn’t they study abroad if that’s what they wanted to do? A bunch of the survey evidence suggests the problem was money. For 56% of the low- and middle-income respondents, they said the availability of financial assistance was very or extremely important. Students from low- and middle-income countries were also much more likely to choose a hypothetical funded offer of admission at a lower ranked school than their peers in high-income countries. Gibson and McKenzie (2014) provides some complementary evidence outside of mathematics. As part of a larger project on migration and brain drain, they identify 851 promising young New Zealanders who graduated high school between 1976 and 2004. These students either represented New Zealand on the International Mathematical Olympiad teams, the International Chemistry Olympiad team, were top in exams, or earned the New Zealand equivalent of the valedictorian rank. Like Agrawal and coauthors, they can then see what happens to New Zealanders who migrate, versus those who remain. They find researchers who moved abroad publish more than those who do not. As noted, this poses some potential problems; even though we know all these students were talented, those who migrate may have different unobserved levels of skill, ambition, risk tolerance, or something. One way they attempt to deal with this is to focus their attention on the subset of researchers who actually do migrate away from New Zealand, and then looking to see what happens to their research output when they move back. The idea here is those who left were, at least initially, displaying similar levels of skill, ambition, risk tolerance, and so forth (if so, why did they return? We’ll get to that). For each New Zealand migrant researcher who returns to New Zealand, Gibson and McKenzie try to find another migrant who stayed abroad, but is similar in age, gender, what they studied in high school, highest degree, and so on. They then look to see what happens to the number of citations to their academic work. While both groups had essentially the same citations prior to return migration, after one group returned to New Zealand, the citations to their work declined substantially relative to the citations of migrants who remained abroad. From Gibson and McKenzie (2014).Citations fall at the end partially due to a mechanical effect: there are fewer years available for more recent papers to receive citations. Again, we see that being abroad was good for research productivity. But again, perhaps we are concerned that there is an important but unstated difference between New Zealanders who stayed abroad and those who returned home. Perhaps the ones who came back simply couldn’t cut it? But we actually don’t see much evidence of that. The figure above matches each returnee to someone who stayed abroad based on a number of characteristics. But one characteristic they were not matched on is citations to their academic work. And yet, prior to returning, their citations were on a very similar trajectory. And like Agrawal and coauthors, Gibson and McKenzie also surveyed their subjects to see why they moved back. Most of the answers were not related to individual research productivity, but had to do with, for example, concerns about aging parents, child-raising, and the location of extended family. Scholarship Restrictions Another line of evidence comes from Kahn and MacGarvie (2016), which focuses on PhD students who come to America from abroad. The paper’s big idea is to compare students who come on the prestigious Fulbright program to similar peers who were not Fulbright fellows. The students and their matches are really similar in this case: they graduated from the same PhD program, either studying under the exact same advisor and graduating within 3 years of each other, or merely studying in the same program but graduating in the same year. The only difference was the Fulbright students have a requirement to leave the USA for two years after finishing their studies, whereas the matched students faced no such...
Innovators Who Immigrate
Age and the Nature of Innovation
Age and the Nature of Innovation
Like the rest of New Things Under the Sun, this article will be updated as the state of the academic literature evolves; you can read the latest version here. You can listen to this post above, or via most podcast apps here. The previous post also now has a podcast available here. Subscribe now Are there some kinds of discoveries that are easier to make when young, and some that are easier to make when older? Obviously yes. At a minimum, innovations that take a very long time basically have to be done by older innovators. So what kinds of innovations might take a long time to complete? Perhaps those that draw on deep wells of specialized knowledge that take a long time to accumulate. Or perhaps those that require grinding away at a question for years and decades, obsessively seeking the answers to riddles invisible to outsiders. What about innovations that are easier when young? Well, we can at least say they shouldn’t be the kinds of innovations that take a long time to achieve. That means discoveries that can be made with years, not decades, of study. But what kinds of innovations that don’t take long study to make are still sitting around, like unclaimed $20 bills on the sidewalk? One obvious kind of unclaimed innovation is the kind that relies on ideas that have only been very recently discovered. If people learn about very new ideas during their initial training (for example, for a PhD), then we might expect young scientists to disproproportionately make discoveries relying on frontier knowledge. At the same time, we might look for signs that older scientists build on older ideas, but perhaps from a place of deeper expertise. Indeed, we have some evidence this is the case. Age, Frontier Ideas, and Deepening Expertise Let’s start with Yu et al. (2022), a study of about 7mn biomedical research articles published between 1980 and 2009. Yu and coauthors do not know the age of the scientists who write these articles, but as a proxy they look at the time elapsed since their first publication. Below are several figures, drawn from data in their paper, on what goes into an academic paper at various stages of a research career. In the left column, we have two measures drawn from the text of paper titles and abstracts. Each of these identifies the “concepts” used in a paper’s title/abstract: these are defined to be the one, two-, and three-word strings of text that lie between punctuation and non-informative words. The right columns relies on data from the citations made by an article. In each case, Yu and coauthors separately estimate the impact of the age of the first and last author.1 Moreover, these are the effects that remain after controlling for various other factors, including what a particular scientist does on average (in economics jargon, they include author fixed effects). Together, they generally tell a story of age being associated with an increasing reliance on a narrower set of older ideas. Source: Regression coefficients with author fixed effects in Tables 4 and 5 of Yu et al. (2022) Let’s start in the top left corner - this is the number of concepts that appear in a title or abstract which are both younger than five years and go on to be frequently used in other papers. Measured this way, early career scientists are more likely to use recent and important new ideas. Moving to the top-right figure, we can instead look at the diversity of cited references. We might expect this to rise over a career, as scientists build a larger and larger knowledge base. But in fact, the trend is the opposite for first authors, and mixed at best for last authors. At best, the tendency to expand the disciplinary breadth of references as we accumulate more knowledge is offset by rising disciplinary specialization. Turning to the bottom row, on the left we have the average age of the concepts used in a title and abstract (here “age” is the number of years that have elapsed since the concepts were first mentioned in any paper), and on the right the average age of the cited references (that is, the number of years that have elapsed since the citation was published). All measures march up and to the right, indicating a reliance on older ideas as scientists age. This is not a phenomenon peculiar to the life sciences. Cui, Wu, and Evans (2022) compute some similar metrics for a wider range of fields than Yu and coauthors, focusing their attention on scientists with successful careers lasting at least twenty years and once again proxying scientist age by the time elapsed since their first paper was published. On the right, we again have the average age of cited references; these also rise alongside scientist age. Source: Regression coefficients with author fixed effects in Tables 4 and 5 of Yu et al. (2022) On the left, we have a measure based on the keywords the Microsoft Academic Graph assigns to papers (of which there are more than 50,000). Between two subsequent years, Cui and coauthors calculate the share of keywords assigned to a scientist’s papers which recur in the next year. As scientists age, their papers increasingly get assigned the same keywords from year to year (though note the overall effect size is pretty small), suggesting deeper engagement with a consistent set of ideas. Lastly, we can look outside of science to invention. Kalyani (2022) processes the text of patents to identify technical terminology and then looks for patents that have a larger than usual share of technical phrases (think “machine learning” or “neural network”) that are not previously mentioned in patents filed in the preceding five years. When a patent has twice as many of these new technical phrases as the average for its technology type, he calls it a creative patent. He goes on to show these “creative” patents are much more correlated with various metrics of genuine innovation (see the patent section of Innovation (mostly) gets harder for more discussion). Kalyani does not have data on the age of inventors, but he does show that repeat inventors produce increasingly less creative patents as time goes by. From Kalyani (2022) This figure shows, on average, an inventor’s first patent has about 25% more new technical phrases than average, their second has only 5% more, and the third patent has about the same number of new technical phrases as average. Subsequent patents fall below average. This is consistent with a story where older inventors increasingly rely on older ideas. As discussed in more detail in the post Age and the Impact of Innovations, over the first 20 years of a scientists career, the impact of a scientist’s best work is pretty stable: citations to the top cited paper published over some multi-year timeframe is pretty consistent. The above suggests that might conceal some changes happening under the hood though. At the outset, perhaps a scientist’s work derives its impact through engagement with the cutting edge. Later, scientists narrow their focus and impact arises from deeper expertise in a more tightly defined domain. Conceptual and Experimental Innovation So far we’ve seen some evidence that scientific discoveries and inventions are more likely to draw on recent ideas when the innovator is young, and an older, narrower set of ideas (plus deeper expertise?) when the innovator is older. I suspect that’s because young scientists hack their way to the knowledge frontier during their training period. As scientists begin active research in earnest, they certainly invest in keeping up with the research frontier, but it’s hard to do this as well as someone who is in full-on training mode. Over a 20-40 year career, the average age of concepts used and cited goes up by a lot less than 20-40 years; but it does go up (actually, it’s pretty amazing the average age of concepts used only goes up 2 years in Yu et al. 2022). I argued at the outset we might expect this. The young cannot be expected to make discoveries that require a very long time to bring about. But among the set of ideas that don’t take a long time to bring about, they need to focus on innovations that have not already been discovered. One way to do that is to draw on the newest ideas. But this might not be the only way. The economist David Galenson has long studied innovation in the arts, and argues it is useful to think of innovative art as emerging primarily from two approaches. The first approach is "experimental." This is an iterative feedback driven process with only vaguely defined goals. You try something, almost at random, you stand back and evaluate, and then you try again. The second approach is “conceptual.” It entails a carefully planned approach that seeks to communicate or embody a specific preconceived idea. Then the project is executed and emerges more or less in its completed form. Both require a mastery of the existing craft, but the experimental approach takes a lot longer. Essentially, it relies on evolutionary processes (with artificial rather than natural selection). It's advantage is that it can take us places we can't envision in advance. But, since it takes so long to walk the wandering path to novelty, Galenson argues that in the arts, experimental innovators tend to be old masters. The Bathers, by Paul Cezanne, one of Galenson’s experimental innovators. Begun when Cezanne was 59. Conceptual approaches can, in principle, be achieved at any point in a lifecycle, but Galenson argues there are forces that ossify our thinking and make conceptual innovation harder to pull off at old ages. For one, making a conceptual jump seems to require trusting into a radically simplified schema (complicated schema are too hard to plan out in advance) from which you can extrapolate into the unknown. But as time goes on, we add detail and temper our initial simplifications, adding caveats, carveouts and extensions. We no longer trust the simple models to leap into the unknown. Perhaps for these reasons, conceptual innovators tend...
Age and the Nature of Innovation
Age and the Impact of Innovations
Age and the Impact of Innovations
Like the rest of New Things Under the Sun, this article will be updated as the state of the academic literature evolves; you can read the latest version here. No podcast today, as I am sick and can’t talk without coughing: maybe later. Also, there is more to say about age and innovation, so stay tuned! Subscribe now Scientists are getting older. Below is the share of employed US PhD scientists and engineers in three different age ranges: early career (under 40), mid-career (ages 40-55), and late career (ages 55-75). The figure covers the 26 years from 1993-2019. Author calculations. Sources: NSF Survey of Doctorate Recipients (1993-2019), data drawn from age by occupation by age tables Over this period, the share of mid-career scientists fell from about half to just under 40%. Most (but not all) of that decline has been offset by an increase in the share of late career scientists. And within the late career group, the share older than 65 has more than doubled to 27% over this time period.1 This trend is consistent across fields. Cui, Wu, and Evans (2022) look at more than one million scientists with fairly successful academic careers - they publish at least 10 articles over a span of at least 20 years. Cui and coauthors compute the share of these successful scientists who have been actively publishing for more than twenty years. Across all fields, it’s up significantly since 1980 (though, consistent with the previous figure, this trend may have peaked around 2015). From Cui, Wu, and Evans (2022) Alternatively, we can get some idea about the age of people doing active research by looking at the distribution of grants. At the NIH, the share of young principal investigators on R01 grants has dropped from a peak of 18% in 1983 to about 3% by 2010, while the share older than 65 has risen from almost nothing to above 6%. From Rockey (2012) This data ends in 2010, but the trend towards increasing age at receiving the first NIH grant has continued through 2020. Is this a problem? What’s the relationship between age and innovation? Aging and Average Quality This is a big literature, but I’m going to focus on a few papers that use lots of data to get at the experience of more typical scientists and inventors, rather than the experience of the most elite (see Jones, Reedy and Weinberg 2014 for a good overview of an older literature that focuses primarily on elite scientists). Yu et al. (2022) look at about 7mn biomedical research articles published between 1980 and 2009. Yu and coauthors do not know the age of the authors of the scientists who write these articles, but as a proxy they look at the time elapsed since their first publication. They then look at how various qualities of a scientific article change as a scientist gets older. First up, data related to the citations ultimately received by a paper. On the left, we have the relationship between the career age of the first and last authors, and the total number of citations received by a paper.2 On the right, the same thing, but expressed as a measure of the diversity of the fields that cite a paper - the lower the number, the more the citations received are concentrated in a small number of fields. In each case, Yu and coauthors separately estimate the impact of the age of the first and last author.3 Note also, these are the effects that remain after controlling for a variety of other factors. In particular, the charts control for the typical qualities of a given author (i.e., they include author fixed effects). See the web appendix for more on this issue. Also, they’re statistical estimates, so they have error bars, which I’ve omitted, but which do not change the overall trends. Source: Regression coefficients with author fixed effects in Table 2 of Yu et al. (2022) The story is a straight-forward one. Pick any author at random, and on average the papers they publish earlier in their career, whether as first author or last author, will be more highly cited and cited by a more diverse group of fields, than a paper they publish later in their career. In the figure below, Cui, Wu, and Evans (2022) provide some complementary data that goes beyond the life sciences, focusing their attention on scientists with successful careers lasting at least twenty years and once again proxying scientist age by the time elapsed since their first paper was published. They compute a measure of how disruptive a paper is, based on how often a paper is cited on it’s own, versus in conjunction with the papers it cites. The intuition of this disruption measure is that when a paper is disruptive, it renders older work obsolete and hence older work is no longer cited by future scientists working in the same area. By this measure, as scientists age their papers get less and less disruptive (also and separately, papers are becoming less and less disruptive over time, as discussed more here).4 From Cui, Wu, and Evans (2022). There is an error in the figure’s legend: the top line corresponds to the 1960s, the one below that to the 1970, below that is the 1980s, and below that is the 1990s. Last up, we can even extend these findings to inventors. Kaltenberg, Jaffe, and Lachman (2021) study the correlation between age and various patent-related measures for a set of 1.5mn inventors who were granted patents between 1976 and 2018. To estimate the age of inventors, Kaltenberg and coauthors scrape various directory websites that include birthday information for people with similar names as patentees, who also live in the same city as a patentee lists. They then compute the relationship between an inventor’s estimated age and and some version of each of the metrics discussed above. Once again, these results pertain to what remains after we adjust for other factors (including inventor fixed effects, discussed below). From Kaltenberg, Jaffe, and Lachman (2021) On the left, we have total citations received by a patent. In the middle, a measure of the diversity of the technologies citing a paper (lower means citations come from a narrower set of technologies). And on the right, our measure of how disruptive a paper is, using the same measure as Cui, Wu, and Evans. It’s a by-now familiar story: as inventors age, the impact of their patented inventions (as measured by citations in various ways), goes down. (The figures are for the patents of solo inventors, but the same trend is there for the average age of a team of inventors) So in all three studies, we see similar effects: the typical paper/patent of an older scientist or inventor gets fewer citations and the citations it does get come from a smaller range of fields, and are increasingly likely to come bundled with citations to older work. And the magnitudes involved here are quite large. In Yu et al. (2022), the papers published when you begin a career earn 50-65% more citations than those published at the end of a career. The effects are even larger for the citations received by patentees. The Hits Keep Coming This seems like pretty depressing news for active scientists and inventors: the average paper/patent gets less and less impactful with time. But in fact, this story is misleading, at least for scientists. Something quite surprising is going on under the surface. Liu et al. (2018) study about 20,000 scientists and compute the probability, over a career, that for any given paper, their personal most highly cited paper lies in the future. The results of the previous section suggest this probability should fall pretty rapidly. At each career stage, your average citations are lower, and it would be natural to assume the best work you can produce will also tend to be lower impact, on average, than it was in earlier career stages. But this is not what Liu and coauthors find! Instead, they find that any paper written, at any stage in your career, has about an equal probability of being your top cited paper! The following figure illustrates their result. Each dot shows the probability that either the top cited paper (blue), second-most cited paper (green), or third-most cited paper (red) lies in the future, as you advance through your career (note it’s actually citations received within 10 years, and normalized by typical citations in your field/year). The vertical axis is this percent. The horizontal one is the stage in your career, measured as the fraction of all papers you will ever publish, that have been published so far. From Liu et al. (2018), extended data figure 1 This number can only go down, because that’s how time works (there can’t be a 50% chance your best work is in the future today, and a 60% chance it’s in the future tomorrow). But the figure shows it goes down in a very surprising way. Assuming each paper you publish has the same probability of being your career best, then when you are 25% of the way through your publishing career, there is a 25% chance your best work is behind you and a 75% chance it’s ahead of you. By the time you are 50% of the way through your publishing career, the probability the best is yet to come will have fallen to 50%. And so on. And that is precisely what the figure appears to show! What’s going on? Well, Yu and coauthors show that the number of publications in a career is not constant. Through the first 20-25 years of a career, the number of publications a scientist attaches their name to seems to rise before falling sharply. Since the average is falling over this period, but the probability of a top cited paper is roughly constant, it must be that the variance is rising (the best get better, the worse get worse), in such a way that the net effect is a falling average. And Yu and coauthors present evidence that is the case. In the figure below, we track the average number of citations that go to hit papers in two different ways. In dark blue, we simply have the additional citations to the top cited paper by career stage. Note, unlike average citations, it does not fall s...
Age and the Impact of Innovations
December 2022 Updates
December 2022 Updates
New Things Under the Sun is a living literature review; as the state of the academic literature evolves, so do we. This post highlights some recent updates. Subscribe now Science: Trending Less Disruptive The post “Science is getting harder” surveyed four main categories of evidence (Nobel prizes, top cited papers, growth in the number of topics covered by science, and citations to recent work by patents and papers) to argue it has become more challenging to make scientific discoveries of comparable “size” to the past. This post has now been updated to include an additional category of evidence related to a measure of how disruptive academic papers are. From the updated article: …The preceding suggested a decline in the number of new topics under study by looking at the words associated with papers. But we can infer a similar process is under way by turning again to their citations. The Consolidation-Disruption Index (CD index for short) attempts to score papers on the extent to which they overturn received ideas and birth new fields of inquiry. To see the basic idea of the CD index, suppose we want to see how disruptive is some particular paper x. To compute paper x’s CD index, we would identify all the papers that cite paper x or the papers x cites itself. We would then look to see if the papers that cite x also tend to cite x’s citations, or if they cite x alone. If every paper citing paper x also cites x’s own references, paper x has the minimum CD index score of -1. If none of the papers citing paper x cite any of paper x’s references, paper x has the maximum CD index score of +1. The intuition here is that if paper x overturned old ideas and made them obsolete, then we shouldn’t see people continuing to cite older work, at least in the same narrow research area. But if paper x is a mere incremental development, then future papers continue to cite older work alongside it. That’s the idea anyway; does it actually map to our ideas of what a disruptive paper is? It’s a new measure and it’s properties are still under investigation, but Wu, Wang, and Evans (2019) tried to validate it by identifying sets of papers that we have independent reasons to believe are likely to be more or less disruptive than each other. They then checked to see that the CD index matched predictions. Nobel prize winning papers? We would expect those to be disruptive, and indeed, Wu and coauthors find they tend to have high CD index scores on average. Literature review articles? We would expect those to be less disruptive than original research, and their CD index is indeed lower on average than the CD index of the papers they review. Articles which specifically mention another person in the title? We would expect those tend to be incremental advances, and they also have lower CD index scores. Lastly, for a sample of 190 papers suggested by a survey of 20 scholars as being distinctively disruptive or not disruptive, the CD index closely tracked which papers were disruptive and which were not. Park, Leahey, and Funk (2022) compute the CD index for a variety of different datasets of academic publications, encompassing many millions of papers. Below is a representative result from 25 million papers drawn from the web of science. Across all major fields, the CD index has fallen substantially. Declining Disruption - from Park, Leahey, and Funk (2022) This decline is robust to a lot of different attempts to explain it away. For example, we might be worried that this is a mechanical outcome of the tendency to cite more papers, and to cite older papers (which we discuss in the next section). For any given paper x, that would increase the probability we cite paper x’s references, in addition to x. Park, Leahey, and Funk, try to show this isn’t solely driving their results in a few different ways. For example, they create placebo citation networks, by randomly shuffling the actual citations papers make to other papers. So instead of paper y citing paper x, they redirect the citation so that paper y now cites some other paper z, where z is published in the same year as x. This kind of reshuffling preserves the tendency over time of papers to cite more references and to cite older works. But when you compute the CD index of these placebo citation networks, they exhibit smaller declines than in the actual citation networks, suggesting the decline of disruption isn’t just a mechanical artifact of the trend towards citing more and older papers. Lastly, it turns out this decline in the average value of the CD index is not so much driven by a decrease in the number of disruptive papers, as it is a massive increase in the number of incremental papers. The following figure plots the absolute number of papers published in a given year with a CD index in one of four ranges. In blue, we have the least disruptive papers, in red, the most disruptive, with green and orange in the middle. Annual # of publications in four CD index ranges. Blue = 0.0-0.25. Orange = 0.25-0.5. Green = 0.5-0.75. Red = 0.75-1.0. From Park, Leahey, and Funk (2022). While the annual number of the most disruptive papers (in red) grew over 1945-1995 or so, it has fallen since then so that the number of highly disruptive papers published in 2010 isn’t much different from the number published in 1945. But over the same time period, the number of the mostly incremental papers (in blue) has grown dramatically, from a few thousand a year to nearly 200,000 per year. As an aside, the above presents an interesting parallel with the Nobel prize results discussed earlier: Collison and Nielsen find the impact of Nobel prize-winning discoveries are not rated as worse in more recent years (except in physics), but neither are they rated better (as we might expect given the increase in scientific resources). Similarly, we are not producing fewer highly disruptive papers; we simply are not getting more for our extra resources. The updated article also includes some new discussion of additional text-based evidence for a decline in the number of topics under study in science, relative to the number of papers, again from Park, Leahey, and Funk (2022). It also adds in some evidence that the rise in academic citations to older works does not merely reflect a rise in polite but inconsequential citations - at least in recent times, the citations to older work are just as likely to be rated influential citations as the citations to younger work. Read the whole thing Creative Patents and the Pace of Technological Progress The article “Innovation (mostly) gets harder” has a similar conclusion to “Science is getting harder”, but applied to the case of technological progress: eking out a given proportional increase along some technological metric seems to require more and more effort. The original article reviewed evidence from a few specific technologies (integrated circuits, machine learning benchmarks, agricultural yields, and healthcare) as well as some broad-based proxies for technological progress (firm-level profit analogues, and total factor productivity). I’ve now updated this article to include a discussion of patents derived from a fascinating PhD job market paper by Aakash Kalyani: …it’s desirable to complement the case studies with some broader measures less susceptible to the charge of cherry-picking. One obvious place to turn is patents: in theory, each patent describes a new invention that someone at the patent office thought was useful and not obvious. Following Bloom et al., below I calculate annual US patent grants1 per effective researcher. As a first pass, this data seems to go against the case study evidence: more R&D effort has been roughly matched by more patenting, and in fact, in recent years, patenting has increased faster than R&D effort! Is innovation, as measured by patents, getting easier? Author calculations. Annual patent grant data from here. US effective researchers computed by dividing annual R&D spending (see figure RD-1 here) by median wage for college educated US workers (spliced data series from Bloom et al., here). The trouble with the above figure is that patents shouldn’t really be thought of as a pure census of new inventions for a few reasons. First off, the propensity of inventors (and inventive firms) to seek patent protection for their inventions seems to have increased over time.2 So the observed increase in annual patenting may simply reflect an increase in the share of inventions that are patented, rather than any change in the number of new inventions. Second, patents vary a lot in their value. A small share of patents seem to account for the majority of their value. We don’t care so much about the total number of patents as the number of valuable patents. On the second problem at least, Kalyani (2022) shows that one way to separate the patent wheat from the patent chaff is to look at the actual text of the patent document. Specifically, Kalyani processes the text of patents to identify technical terminology and then looks for patents that have a larger than usual share of technical phrases (think “machine learning” or “neural network”) that are not previously mentioned in patents filed in the preceding five years. When a patent has twice as many of these new technical phrases as the average for its technology type, he calls it a creative patent. About 15% of patents are creative by this definition. Kalyani provides a variety of evidence that creative patents really do seem to measure new inventions, in a way that non-creative patents don’t. Creative patents are correlated with new product announcements, better stock market returns for the patent-holder, more R&D expenditure, and greater productivity growth. Non-creative patents, in general, are not. And when you look at the number of creative patents (in per capita terms - it’s the solid green line below), Kalyani finds they have been on the decline since at least 1990. From Kalyani (20...
December 2022 Updates
Answering Your Questions
Answering Your Questions
To celebrate passing 10,000 subscribers, last week I asked for questions from readers. There were too many to answer, but here’s an initial 10. If I missed yours, or you want to submit another question, I’m going to add a reader questions section to the bottom of my future updates posts, so feel free to ask a question using this form and I’ll try to get to it in the future. Otherwise, back to normal posting next time. One more piece of news; I’ve joined Open Philanthropy as a Research Fellow! I will continue to write New Things Under the Sun while there, but among other things I’ll also be trying to expand the New Things Under the Sun model to more writers and more academic fields. More details will be coming down the road, but if you are an academic who wants to write the definitive living literature review for your passion topic, drop me an email (matt.clancy@openphilanthropy.org) and I’ll keep you in the loop! I’m sad to leave the Institute for Progress, which continues to do outstanding work I really believe in, but I will remain a senior fellow with them. On to your questions! Subscribe now What is the most critical dataset that you would like to do research on but currently does not exist or is not available? - Antoine Blanchard I’m going to dream big here: I would love to see a better measure of technological progress than total factor productivity or patents. One particularly interesting idea for this was suggested to me by Jeff Alstott. Imagine we collected the technical specifications of thousands (millions?) of different kinds of individual technologies that seek to give a representative cross-section of human capabilities: solar panels, power drills, semiconductors, etc. There is some precedent for trying to collect technical specifications for lots of technologies, but it has typically been pretty labor intensive. However, gathering and organizing this data at a huge scale seems to be entering the realm of possibility, with the digitization of so much data and better data scraping technology. For example, we now have some inflation indices based on scraping price data from the web at a very large scale. Once you have all this data, for each class of technology, you can map out the tradeoff among these specifications to map the set of available technologies. How tradeoffs evolve over time is a quite direct and tangible measure of technological progress. This kind of technique has been used, for example, to model technological progress in the automobile industry (see image below). You then need a way to normalize the rate of progress across very different domains, and to weight progress across different goods so we can aggregate them up to a meaningful measure of overall progress. Lastly, to be most useful for research, you would want to link all this data up to other datasets, such as data on firm financials, or underlying academic research and patents. Adapted from Knittel (2011) It would be a huge undertaking, but with modern computing power, I’m not sure it’s much worse than computing many other economic statistics, from inflation to GDP. And it would help remove some serious measurement issues from research to understand what drives innovation. Can we quantify the impact of information and knowledge storage/sharing innovations, on the progress of innovation? Things like libraries, and more modern knowledge management systems. And obviously things like mobile characters and the printing press etc. What is the value of knowledge commons? - Gianni Giacomelli  Let’s start with the assumption that most good inventions draw on the accumulated knowledge of human history. If you can’t accumulate knowledge, I think most innovation would proceed at a glacial pace. Tinkering would still occasionally result in an improvement, but the pace of change would be evolutionary and rarely revolutionary. So if it’s a question of having access to accumulated knowledge or not having access, the value of having access is probably close to the value of R&D. But our ability to store and access knowledge is itself a technology that can be improved via the means you suggest. What we want to study is the incremental return on improvements to this knowledge management system. Some papers have looked at this for public libraries, patent libraries, and wikipedia (see the post Free Knowledge and Innovation). Having a public or patent library nearby appears to have helped boost the local rate of innovation by 10-20%. One way to interpret this is that an improvement in the quality of knowledge commons equivalent to the difference between a local or distant library could buy you a 10-20% increase in the rate of innovation. Nagaraj, Shears, and de Vaan (2020) find significantly larger impacts from making satellite imagery data available, in terms of the number of new scientific papers this enabled. And other papers have have documented how access to a knowledge commons changes what kinds of works are cited: Zheng and Wang (2020) looks at what happened to Chinese innovation when the Great Firewall cut off access to google; Bryan and Ozcan (2020) show requirements to make NIH-funded research open access increased people citation of it. In each case, its clear access had a measurable impact, but it’s tough to value. As an aside, my own belief is improving the knowledge commons gives you a lot of bang for your buck, especially from the perspective of what an individual researcher can accomplish. But of course, I’m biased. I was wondering if there has been a significant long-term impact of internet on economic growth and if there is any evidence to suggest that any of the economic growth in the last 2 decades can be attributed to the rise of internet - Daniyal from Pakistan There’s at least two different ways the internet affects economic growth. First and most obviously, it directly creates new kinds of economic activity - think Uber, Netflix, and Amazon. Unsurprisingly, this digital economy has been growing a lot faster than the non-digital economy (6.3% per year, compared to 1.5% per year for the whole economy, over 2012-2020 in the USA), but since it only counts for about 10% of the US economy, the impact on headline growth can’t have been too big yet. So, sure, the internet has contributed to faster economic growth, though the effect isn’t particularly large. Second, and more closely related to the themes of this newsletter, the internet can also affect the overall rate of innovation (including innovation in non-internet domains). It allows researchers to collaborate more easily at a distance and democratizes access to frontier ideas. These impacts of the internet have been a big theme of my writing - see the post Remote work and the future of innovation for a summary of that work, and more specifically the post The internet, the postal service, and access to distant ideas. I think on the whole, the internet has likely been good for the overall rate of innovation; we know, for example, that it seems to help regions that are geographically far from where innovation is happening keep up. It also helps enable new kinds of collaboration which, though possibly less disruptive than their more traditional counterparts, might simply not exist at all otherwise. It does seem a bit surprising the effect is not much larger though; why doesn’t having easy access to all the world’s written information multiply innovation by a factor of 10 or 100? The fact that it doesn’t suggests we should think of innovation as being comprised of lots of factors that matter (see this overview for some of those factors) and it’s hard to substitute one for the other. We get bottle-necked by the factors that are in short supply. To take a concrete example, it may be that the world’s written information is now at our fingertips, but the overall number of people interested in using it to innovate hasn’t increased much. Or that written information is rarely enough to take an R&D project across the finish line, so that we’re bottlenecked by the availability of tacit knowledge. Research in developing countries is both cheaper and of lower perceived quality than that which is carried out in developed countries. To what extent are these two outcomes separable? Do you think it's conceivable that the former can improve to the extent that a large share of technologically sophisticated R&D will be outsourced in the future? - Aditya I take it as a given that talent is equally distributed around the world, but I think developing countries face at least two main disadvantages in producing research that is perceived to be high quality. First, research can be expensive and rich countries can provide more support to researchers - not only salary support, but also all the other non-labor inputs to research.  Second, rich countries like the USA have tended to attract a disproportionate share of top scientific talent. As I’ve argued, while academic work is increasingly performed by teams collaborating at a distance, most of the team members seem to initially get to know each other during periods of physical colocation (conferences, postdocs, etc). Compared to a researcher physically based in a rich country on the scientific frontier, it will be harder for a researcher based in a developing country to form these relationships. Compounding this challenge, researchers in developing countries may face additional challenges to developing long-distance relationships: possibly linguistic differences, internet connectivity issues, distant time zones, lack of shared cultural context, etc. Moreover, we have some evidence that in science, the citations a paper receives are better predicted by the typical citations of the team member who tends to get the least citations on their own work. That means the returns to having access to a large pool of collaborators is especially high - you can’t rely on having a superstar, you need a whole team of high performers. Lastly, the...
Answering Your Questions
Taking Your Questions
Taking Your Questions
Dear reader, This newsletter recently got it’s 10,000th subscriber. To celebrate, I thought I would try something new and take reader questions. So: ask me anything by using this google form. I’ll try to get through as many questions as I can in the next post, which will hopefully come out the week of November 14th. Cheers everyone and thanks for your interest in this project, Matt
Taking Your Questions
Are Technologies Inevitable?
Are Technologies Inevitable?
Dear reader, This week’s post is not the usual thing. I designed New Things Under the Sun to feature two kinds of articles: claims and arguments. Almost everything I write is a claim article (or an update to them). Today’s post is the other kind of article, an argument. The usual goal of a claim article is to synthesize several academic papers in service of assessing a specific narrow claim about innovation. Argument articles live one level up the chain of abstraction: the goal is to synthesize many claim articles (referenced mostly in footnotes) in service of presenting a bigger picture argument. That means in this post you won’t see me talk much about specific papers; instead, I’ll talk about various literatures and how I think they interact with each other. Also, this article is really long; probably about twice as long as anything else I’ve written. Rather than send you the whole thing in email, I’m sending along the introduction below, an outline, and a link to the rest of the article, which lives on the NewThingsUnderTheSun.com. Alternatively, you can listen to a podcast of the whole thing here. Cheers everyone and thanks for your interest, Matt Subscribe now Take me straight to the whole article Are Technologies Inevitable? Introduction In a 1989 book, the biologist Stephen Jay Gould posed a thought experiment: I call this experiment “replaying life’s tape.” You press the rewind button and, making sure you thoroughly erase everything that actually happened, go back to any time and place in the past… then let the tape run again and see if the repetition looks at all like the original.” p48, Wonderful Life Gould’s main argument is: …any replay of the tape would lead evolution down a pathway radically different from the road actually taken… Alter any early event, ever so slightly and without apparent importance at the time, and evolution cascades into a radically different channel. p51, Wonderful Life Gould is interested in the role of contingency in the history of life. But we can ask the same question about technology. Suppose in some parallel universe history proceeded down a quite different path from our own, shortly after Homo sapiens evolved. If we fast forward to 2022 of that universe, how different would the technological stratum of that parallel universe be from our own? Would they have invented the wheel? Steam engines? Railroads? Cars? Computers? Internet? Social media? Or would their technologies rely on principles entirely alien to us? In other words, once humans find themselves in a place where technological improvement is the rule (hardly a given!), is the form of the technology they create inevitable? Or is it the stuff of contingency and accident? In academic lingo, this is a question about path dependency. How much path dependency is there in technology? If path dependency is strong, where you start has a big effect on where you end up: contingency is also strong. But if path dependency is weak, all roads lead to the same place, so to speak. Contingency is weak. Some people find this kind of thing inherently fun to speculate about. It’s also an interesting way to think through the drivers of innovation more generally. But at the same time, I don’t think this is a purely speculative exercise. My original motivation for writing it was actually related to a policy question. How well should we expect policies that try to affect the direction of innovation to work? How much can we really direct and steer technological progress? As we’ll see, the question of contingency in our technological history is also related to the question of how much remains to be discovered. Do we have much scope to increase the space of scientific and technological ideas we explore? Or do we just about have everything covered, and further investigation would mostly be duplicating work that is already underway? I’ll argue in the following that path dependency is probably quite strong, but not without limits. We can probably have a big impact on the timing, sequence, and details of technologies, but I suspect major technological paradigms will tend to show up eventually, in one way or another. Rerun history and I doubt you’ll find the technological stratum operating on principles entirely foreign to us. But that still leaves enormous scope for technology policy to matter; policies to steer technology probably can exert a big influence on the direction of our society’s technological substrate. The rest of the post is divided into two main parts. First, I present a set of arguments that cumulatively make the case for very strong path dependency. By the end of this section, readers may be close to adopting a view close to Gould’s: any change in our history might lead to radically different trajectories. I think this actually goes too far. In the second part of the essay, I rein things in a bit by presenting a few arguments for limits to strong path dependency. The rest of the piece goes on to make the following argument: Part One: The Case for Strong Path Dependency Small scale versions of replaying the technology tape point to path dependency being at least big enough to notice The landscape of possible technologies is probably very big because Combinatorial landscapes are very big Technology seems to have an important combinatorial element Our exploration of this space seems a bit haphazard and incomplete From the constrained set of research and invention options actually discovered, an even smaller set get an early lead, often for highly contingent reasons, and then enjoy persistent rich-get-richer effects Part Two: The Limits of Path Dependence It may not matter that the landscape of technological possibility is large, if the useful bits of it are small. This may be plausible because This might be the case for biology It is probably possible to discover the small set of universal regularities in nature via many paths Human inventors can survey the space of technological possibility to a much greater degree than in biological evolution A shrinking share of better technologies combined with our ability to survey the growing combinatorial landscape can yield exponential growth in some models Read the whole thing here As always, if you want to chat about this post or innovation in generally, let’s grab a virtual coffee. Send me an email at mattclancy at hey dot com and we’ll put something in the calendar. New Things Under the Sun is produced in partnership with the Institute for Progress, a Washington, DC-based think tank. You can learn more about their work by visiting their website.
Are Technologies Inevitable?
Remote Breakthroughs
Remote Breakthroughs
Like the rest of New Things Under the Sun, this article will be updated as the state of the academic literature evolves; you can read the latest version here. You can listen to this post above, or via most podcast apps: Apple, Spotify, Google, Amazon, Stitcher. Remote work seems to be well suited for some kinds of knowledge work, but it’s less clear that it’s well suited for the kind of collaborative creativity that results in breakthrough innovations. A series of new papers suggests breakthrough innovation by distributed teams has traditionally been quite difficult, but also that things have changed, possibly dramatically, as remote collaboration technology has improved. Subscribe now Distant and Colocated Collaboration Are Not Alike We can begin with Van der Wouden (2020), which looks at the history of collaboration between inventors on US patents, over the period 1836 to 1975. To build a useful dataset, he has to extract the names and locations of inventors from old patent documents, which have been digitized into super messy text files by Google. These digitized patents are rife with misspelling (because optical character recognition scanning is very imperfect for old documents) and lacking in much of any standardization. It’s a ton of work that involves fuzzy matching text strings to a big list of names which in turn is drawn from the US census, modern patent documents, and an existing database of inventor names. And that’s only the first step - it just tells you the names of people mentioned in a patent, not whether those names are inventors, rather than lawyers or experts. To figure out who is an inventor, Van der Wouden uses a set of classification algorithms that predict the probability a mentioned name is an inventor using a dataset of known inventors linked to patents. It’s not a perfect method, but it is able to find an inventor on about 90% of historical patents. Moreover, the people it identifies as top patent holders, and the number of patents they hold, matches pretty closely other lists of top patentees in US history. He also has to do similar work to pull out the locations mentioned on a patent. Now that he has an estimate of how many people worked on each patent, and where they lived, Van der Wouden can start to look at how common collaboration and remote collaboration are. We can see that collaboration really began to take off in the 1940s and that the probability a team of inventors didn’t reside in the same city rose from under 5% in 1836 to over 10% by 1975. From Van der Wouden (2020) Van der Wouden next tries to measure the complexity of a patented invention with an approach originally used in another paper, Fleming and Sorenson (2004).1 Fleming and Sorenson attempted to create a measure of how “fussy” technological classifications were, based on how well they seem to play nice with other technologies (fussy is my term, not their’s, but I think it captures what they’re going for in a colloquial way). If a technological classification is frequently attached to a patent alongside a wide range of other classifications, they’re going to say this isn’t a very “fussy” technology. It can be used in plenty of diverse applications. On the other extreme, if a classification is only ever assigned to a patent with one other classification, then we’re going to assume the technology is very sensitive and very fussy. It only works well in a very specific context. While this measure is a bit ad-hoc, Fleming and Sorenson also did a survey of inventors and showed their measure is correlated with inventors self-assessments of how sensitive their own inventions are to small changes, and that this measure is not merely picking up how novel or new the technology is; it’s picking up something a bit different. Returning to Van der Wouden (2020), his measure says a patent is more complex if it involves more technologies, and if these technologies are “fussy.” There are two key results: complex patents are more likely to be the work of teams. And among patents by a team of inventors, the inventors are more likely to reside in the same city if the patent is more complex. It seems that, at least over 1836-1975, it is hard to do complex work at a distance. Lin, Frey, and Wu (2022) pick up Van der Wouden’s baton and take us into the present day. They look at the character of both patents and academic papers produced by collocated and remote teams over 1960-2020 (actually 1975-2020 for patents), but focusing on how disruptive a paper or patent is. To measure disruption, they use an increasingly popular measure based on citations. To simplify a bit, the idea here is that if a paper or patent is disruptive, you’re not going to cite the stuff it cites, because the paper or patent has rendered those older ideas obsolete. After Einstein, you no longer cite Newton. On the other hand, if a paper is an incremental improvement within a given paradigm, you are likely to cite it as well as its antecedents. This disruption measure quantifies this notion: for some focal document, it’s based on how many citations go to the focal document alone relative to how many citations go to the focal document as well as the documents cited by the focal document. Across 20mn research articles and 4mn patents, Lin, Frey, and Wu find that, on average, the farther away the members of the team are from one another, the less likely the paper is to be disruptive. From Lin, Frey, and Wu (2022) So, over 1836-1975 the patents of inventors who reside in the same cities tended to be more complex, in the sense that they either drew on more technologies, or more technologies that don’t have a long history of successfully being combined with other technologies. And over 1975 to 2020, patents with inventors residing in the same city were more likely to be disruptive, in the sense that they are more likely to receive citations that do not also reference earlier work. Does Distance Inhibit Strange Combinations? These measures are not picking up exactly the same thing, but neither are they as different as they might seem at first. As discussed in a bit more detail here, Lin, Evans, and Wu (2022) find that papers that draw on novel combinations of ideas (in this paper, proxied by the kind of journals a paper cites) are also more likely to be disruptive. In other words, it might well be that the reason Lin, Frey, and Wu find papers by distant teams are less likely to be disruptive is because dispersed teams have a harder time connecting different ideas. We’ve got a few pieces of evidence that support the notion that remote teams have a harder time making novel connections across ideas. First, both Berkes and Gaetani (2021) and Duede et al. (2022) find some evidence that colocation is an important channel for exposure to intellectually distant concepts. As discussed here, Berkes and Gaetani (2021) show that: The patents of inventors residing in denser parts of cities comprise a more diverse set of technologies The set of technologies that comprise the patents of denser parts of cities is more unorthodox: two different technologies might rarely originate from the same geographical location, but when they do that area is more likely to be a dense part of a city The patents of inventors residing in denser parts of cities are more likely to feature unusual combinations of technologies themselves. That’s all consistent with the idea that being physically around lots of different kinds of inventive activity increases the chances you draw an unexpected connection between two disparate concepts. Duede and coauthors provide some fine-grained evidence from academia. They have a big survey where they ask thousands of academics across many fields about citations they made in some of their recent work. Among other things, they asked respondents how well they knew the cited paper, as well as how influential was the citation to the respondent’s work. In the latter case, respondents rated their citations on a scale from “very minor influence”, which meant the respondent’s paper would have been basically unchanged without knowledge of the cited reference, to “very major influence”, which meant the cited reference motivated the entire project. If we have a way to measure the geographic distance between the authors and the “intellectual distance” between the citation and the author’s normal expertise, we can see how the two are related: does being close in space facilitate learning about ideas you wouldn’t normally know about? Computing distance in space is straightforward: Duede and coauthors just code whether authors are in the same department, same institution, same city, or same country. To measure intellectual distance, they rely on the similarity of the title and abstract of the citing and cited paper, as judged by natural language processing algorithms. This algorithm judges papers to be more similar if they contain words that are themselves more closely related to each other. Duede and coauthors find if you and the author of a paper you cite are at the same university, then you are indeed more likely to say you know the cited work well and that it was influential on you. But what’s interesting is that the strength of this relationship is stronger if the cited and citing paper are less similar to each other. In other words, if you cite a paper that’s surprising, given the topic you are working on, you are more likely to say you know that paper well and that it influenced you if the author is at the same university. That’s quite consistent with colocation being a useful way to learn about ideas you wouldn’t otherwise encounter in the course of your normal knowledge work. The second line of evidence is larger, but less direct: physical proximity seems to be quite important for helping people form new relationships, especially relationships that wouldn’t have been formed in the course of ordinary knowledge work. I’ve looked at this line of evidenc...
Remote Breakthroughs
September 2022 Updates
September 2022 Updates
New Things Under the Sun is a living literature review; as the state of the academic literature evolves, so do we. This post highlights some recent updates. Thanks for reading What's New Under the Sun! Subscribe for free to receive new posts. Same Data, Same Question, Different Answers The post “One question, many answers” looked at the “many analyst” literature, wherein a bunch of different researchers and research teams independently try to answer the same set of questions, using the exact same dataset. Surprisingly, it’s not at all uncommon for different teams to arrive at different conclusions. I’ve added to this post another recent paper, Menkveld et al. (2021): Finally, Menkveld et al. (2021) wrangles 164 teams of economists to test six different hypotheses about financial markets using a big dataset of European trading data. Testing these hypotheses required participants to define and build their own measures and indices, and to see if they have increased or decreased over time. As should be no surprise by now, the teams came up with an enormous range of estimates. For example, on one hypotheses - how has the share of client volume in total volume changed - 4% of teams found it had increased, 46% found it had declined, and 50% found no statistically significant change over time. The updated post integrates discussion of Menkveld et al. (2021) throughout, where it echoes the findings of other papers in this genre, for example, in its finding that the dispersion of expertise does not seem to account for much of the dispersion in results. Instead, the post argues this difference stems from some of the inadequacies in our “methodological technology.” There are many different points at which researchers can make different, defensible, research choices, and those difference add up. One place researchers can make different decisions is at step one: what counts as evidence that answers the stated research question? Another recent paper - Ausburg and Brüderl (2021) - suggests such differences were an important factor in the difference outcomes found in one the most famous of these studies, Silberzahn et al. (2018). Ausburg and Brüderl (2021) provides some interesting detail on what drove different answers in [Silberzahn et al. (2018)], by digging back into the original study’s records. After analyzing each team’s submitted reports, Ausburg and Brüderl argue that the 29 teams were actually trying to answer (broadly) four different questions. Recall [Silberzahn et al. (2018)’s] research prompt was “are soccer players with dark skin tone more likely than those with light skin tone to receive red cards from referees?” Ausburg and Brüderl argue some interpreted this quite literally, and sought to compute the simple average difference in the risk of red cards among dark- and light-skinned players, with no effort to adjust for any other systematic differences between the players. Others thought this was a question specifically about racial bias. For them, the relevant hypothetical was the average difference in risk of a red card among two players who were identical except for their skin tone. Yet others interpreted the question as asking “if we are trying to predict the risk of red cards, does skin tone show up as one of the most important factors?” And still others thought of the whole project as being about maximizing the methodological diversity used to tackle a question, and saw their role as trying out novel and unusual methodologies, rather than whatever approach they thought most likely to arrive at the right answer! Menkveld and coauthors’ paper on financial markets provide some other evidence that tighter bounds on what counts as evidence can reduce, though not eliminate, the dispersion of answers. Recall this paper asked researchers to answer six different hypotheses. Some of these hypotheses were relatively ambiguous, such as “how has market efficiency changed over time?” leaving it to researchers define and implement a measure of market efficiency. Other hypotheses permitted much less scope for judgment, such as “how has the share of client volume in total volume changed?” The dispersion of answers for the more tightly defined questions was much narrower than for the more nebulous questions. The updated post also discusses some promising evidence that when teams are allowed to discuss each other’s results and offer feedback, this can substantially reduce the dispersion in their results. Read the whole thing More Evidence Publication Bias is Real The many analysts literature is worrying enough, but publication bias compounds the problem it identifies. Publication bias is when the probability a result gets published is dependent on the result identified. In general, we worry that there is a preference for novel results that identify some new statistical relationship, as opposed to results that find no statistically significant correlation between variables. This can create a biased picture of the evidence, because if so-called “null results” are not publishable, a review of the literature will seem to find unanimous evidence for some statistical relationship. The post “Publication bias is real” reviews various lines of evidence on the existence of publication bias and its magnitude. I’ve added to this post a new short section on experimental papers. As a first step, let’s consider some papers that use experiments to explicitly see whether reviewers treat papers differently, depending on the results. In each of these papers, reviewers receive descriptions of papers (or actual papers) that are basically identical, except for the results. For one random set of reviewers, the papers (or descriptions of papers) obtain statistically significant results; in the other, these results are changed to be statistically insignificant. But as much as possible, what the reviewers see is otherwise unchanged. The papers then compare the recommendations and ratings of the two groups of reviewers to see if the non-significant results are rated more poorly or given lower recommendations than the significant ones. We have three papers from different fields. Emerson et al. (2010) has 110 actual reviewers of papers of orthopedic journals do a standard peer review of one of two different fictitious papers, each of which are identical but for the results. Berinsky et al. (2021) email short descriptive vignettes of research papers to all faculty in US political science departments that grant PhDs and have respondents fill out surveys about these vignettes, getting about 1,000 responses. Similarly, Chopra et al. (2022) get responses on short descriptive vignettes of economics papers from about 500 responses from economists at top 200 departments. These studies varied a bit in exactly how they measured support for publication and what other dimensions they studied, but in all cases reviewers believed papers with statistically significant results were better candidates for publication. The figure below tracks, in dark blue, the probability a given reviewer would support publication among the reviewers who saw a statistically significant finding, while light blue illustrates the same for reviewers who saw a statistically insignificant result of an otherwise identical paper. Orthopedics data from Table 1 of Emerson et al. (2010). Political science data from in text description of Figure 3 of Berinsky et al. (2021). Economics data computed from in text description of Table 3 of Chopra et al. (2022). To emphasize - the only difference in the papers or paper vignettes that respondents read in the above figure was whether the result was described as statistically significant or not. Holding everything else fixed - the research question, the methodology, the quality of the writing, the sample size, etc - reviewers were less likely to recommend the versions of the papers that found non-significant results be published. The rest of the post looks at other evidence that takes a variety of complementary approaches. Read the whole thing Weaker Methods → Worse Bias? Finally, the post “Why is publication bias worse in some disciplines than others?” seeks to get some answers about why we have publication bias, and more specifically, why some fields seem to have it worse than others. This is a subject where the experimental literature discussed above has been really clarifying I think. I have largely rewritten a discussion of possible reasons for why publication bias might vary across fields: Suppose the root cause of publication bias is that journals want to highlight notable research, in order to be relevant to their readership. There are at least two different ways this can lead to publication bias, depending on what journals view as “notable” research. First, it might be that journals consider surprising results to be the most notable. After all, if we’re not surprised by research, doesn’t that imply we already sort-of knew the result? And what would be the point of that? But this leads to publication bias if results that challenge the prevailing wisdom are easier to publish than results that support it. In aggregate the weight of evidence is distorted because we do not observe the bulk of the boring evidence that just supports the conventional wisdom. This could lead to variation in publication bias across fields if fields vary in the breadth of what is considered surprising. For example, we could imagine one field that is very theoretically contested, with different theories making very different predictions. In that field, perhaps everything is surprising in light of some theory and so most results are publishable. In this field, we might not observe much evidence of publication bias. In another field (social science?), perhaps there is an unstated assumption that most hypotheses are false and so null results are perceived as boring and hence difficult to publish. In this field, we would observe a lot of evidence of pu...
September 2022 Updates
What if we could automate invention?
What if we could automate invention?
Before today’s post, a reminder: The Institute for Progress is hosting a free 6-week online PhD course titled “The economics of ideas, science and innovation.” The deadline to apply is the end of today! Learn more here! Now for your regularly scheduled content… Like the rest of New Things Under the Sun, this article will be updated as the state of the academic literature evolves; you can read the latest version here. You can listen to this post above, or via most podcast apps: Apple, Spotify, Google, Amazon, Stitcher. These are weird times. On the one hand, scientific and technological progress seem to be getting harder. Add to that slowing population growth, and it’s possible economic growth over the next century or two might slow to a halt. On the other hand, one area where we seem to be observing rapid technological progress is in artificial intelligence. If that goes far enough, it’s easy to imagine machines being able to do all the things human inventors and scientists do, possibly better than us. That would seem to pull in the opposite direction, leading to accelerating and possibly unbounded growth; a singularity. Are those the only options? Is there a middle way? Under what conditions? This is an area where some economic theory can be illuminating. This article is bit unusual for New Things Under the Sun in that I am going to focus on a small but I think important part of a single 2019 article: “Artificial Intelligence and Economic Growth” by Aghion, Jones, and Jones. There are other papers on what happens to growth if we can automate parts of economic activity,1 but Aghion, Jones, and Jones (2019) is useful because (among other things) it focuses on what happens in economic growth models if we automate the process of invention itself. We’ll see that automating invention does indeed lead to rapidly accelerating growth, but only if you can completely automate it. If not, and if the parts you can’t automate are sufficiently important, then Aghion, Jones, and Jones show growth will be steady: no singularity. I’m going to try to explain their results using a simplified model that I think gives the intuitions but doesn’t require me to write any math equations. Subscribe now A Baseline: Human Driven Innovation Before getting into Aghion, Jones, and Jones’ model, let’s see what these models predict would happen if innovation continued to be a mainly human endeavor. To start, we need a way to measure technological progress. For this simplified model, things will be easier if we can just assume technology proceeds in discrete steps, so I’m going to use something a little unusual for economics: the Kardashev scale. This is a hypothetical measure of a civilization’s technological level based on the amount of energy it can harness. In the usual formulation, civilizations come in three types. A Type 1 civilization can harness all the energy emitted by a parent star that reaches its home planet. A Type 2 civilization can harness all the energy emitted by its parent star. A Type 3 civilization can harness all the energy emitted by its galaxy! A typical Kardashev Scale. Wikipedia The differences between each type are gigantic. It’s estimated that a Type 2 civilization would use about 1010 more energy than a Type 1 civilization, and Type 3 civilization uses about 1010 more energy than a Type 2 civilization. Let’s make things a bit more manageable by creating smaller 0.1 Kardashev increments. A Type 1.1 civilization uses 10 times as much energy as a Type 1.0 civilization, a Type 1.2 civilization uses 10 times as much energy as a Type 1.1, and so forth. We can think of a staircase that goes up three levels: the first floor above ground is a Type 1 civilization, the second floor is a type 2 civilization, and the third floor is a Type 3 civilization, and there are ten steps on the staircase between each floor. By this definition, we are currently sitting at something like a Type 0.7 civilization, since total energy from the sun to Earth is maybe a thousand times as much as the energy our civilization currently uses. We’ll measure the rate of technological progress by the length of time it takes us to climb a 0.1 increment up the Kardashev scale. Let’s now make a few assumptions about how economic progress happens. They’re simple and unrealistic. Everyone in the world devotes themselves full time to inventing. Global population grows by 0.7% per year, which means it doubles every 100 years. Inventing gets harder. Every 0.1 step up in our Kardashev scale takes twice as many inventor-years to achieve. I’ll re-examine these assumptions towards the end of this post. But in our baseline scenario without any automation, this set of assumptions means civilization climbs one 0.1 step up the Kardashev scale every century. Each step is twice as “hard” as the last, in the sense that it takes twice as many inventor-years, but the growth rate of the population ensures the number of inventors also doubles every century, so the overall growth rate is steady. Invention gets twice as hard, but there are twice as many inventors per century. We can also see that if we tinkered with the growth rate of inventors, the growth rate of the economy would change. If population growth rises to 1.4% per year, the population of inventors doubles every 50 years, and we advance two Kardashev steps every century. On the other hand, if population stopped growing, then our growth rate would get cut in half with each 0.1 step up the Kardashev scale. We would still advance, but it would take twice as long, with each step, to get enough inventor-years to climb a step up the Kardashev scale. Automating Invention Now let’s tweak this model. Instead of humans doing all the inventing, let’s assume robots can do it and humans can relax. The key difference between this model and the last is that human population growth is a matter of fertility choices and ever since we escaped the Malthusian trap, those don’t seem to depend much on the size of the economy. Specifically, we assumed the human population grew at 0.7% per year, no matter what our Kardashev level was. Robots though, are something we build using economic resources. That means, as the economy grows larger, we are able to build more robots. Specifically, let’s assume, like energy, the number of robots we can build also increases by 10x every time we go up a step of the Kardashev scale. This results in a radically different dynamic than when we relied solely on human inventors. Now, every time we climb 0.1 steps up the Kardashev scale, we can throw 10x as many (robot) inventors at climbing the Kardashev scale as we could during the last step. True, innovation gets harder and it takes twice as many (robot) inventors to advance with each step, but since we get 10x as many (robot) inventors at each step, we still advance in 1/5 the time with each step. If it takes a century to get from 0.6 to 0.7 (roughly where we are today), then it takes twenty years to get from 0.7 to 0.8, four years to get from 0.8 to 0.9, and under one year to go from 0.9 to 1.0! This acceleration continues at a blistering pace: once we reach a Type 1 civilization, we’ll get to a galaxy-spanning Type 3 civilization in less than three months! The Pace of Progress with Robot Inventors As with the human inventor baseline, we can also tinker with our assumptions in this model to see what happens. Suppose every 0.1 step up the Kardashev scale only increases our ability to manufacture robots by 4x, instead of 10x. In that case, we’ll still have more robots than the 2x needed to advance to the next Kardashev increment in the same amount of time, so growth will still accelerate; just not as quickly. On the other hand, if the number of robots we can build increases by less than 2x for every increment up the Kardashev scale, then economic growth slows down over time (assuming the humans are still just relaxing and not trying to invent). The key is that our ability to improve our inventive capability grows faster than the rate at which invention gets harder. Taking Stock This exercise has a lot of simplifications but as a first approximation, it seems to capture our intuitions about the weirdness of our times. If innovation is getting harder, and population growth is expected to slow, then maybe economic growth will steadily slow down over time. On the other hand, if we can automate innovation, the exact opposite can happen (provided invention doesn’t get harder too fast). The key point is that the second case has a self-amplifying dynamic that is absent from the first. Robot inventors improve the ability of the economy to make more robot inventors, which can lead to accelerating growth. Human inventors enjoy living in a richer economy, but their growth rate is independent of it. Could we really jump from a Type 1 civilization to a Type 3 civilization in three months though, even in this simple model? Probably not, given our current understanding of the laws of physics. For example, it seems sensible to believe the universe’s speed limit would drastically slow down this process; the edge of the galaxy is close to a million light-years away, so maybe we can’t get a galaxy spanning civilization for at least that long. That might seem like it’s missing the point of our illustrative model, but it actually points to something quite important: what tends to drive long run growth is not our strengths but our weaknesses. We’ll come back to that. A More Realistic Model of Automating Invention This model captures our intuitions well but it’s a bit too simple to help us think through the effects of automation because in this model automation is an all-or-nothing proposition. Either humans or the robots are the inventors. Aghion, Jones, and Jones propose a model that helps us think through the implications of a more realistic case where automation is partial but advancing in its capabilities. They suggest we think of the innovati...
What if we could automate invention?