A Layman’s Guide to Separating Causation from Correlation … and Noticing When Claims of Causality are Invalid

Imagine you’re the Minister for Education, deciding how large to make a school district. Larger school districts offer parents more school choice. You look at data from thousands of school districts and find that, in larger districts, child performance is better. You’re tempted to infer that district size increases child performance. But, as we know, correlation doesn’t imply causation. There are two alternative explanations:

  1. Reverse causality: child performance increases district size. When kids are doing better, a school district is allowed to expand.
  2. Omitted variables: neither district size nor child performance cause each other, but a third variable causes both. If parents care about education, they will both demand school choice (larger school districts) and also tutor their kids at home (increasing child performance).

The problem of separating causality from correlation occurs in virtually every question that we try to study with data.

  • Showing that adults with a degree earn higher salaries doesn’t mean that university is a worthwhile investment. It might be that high-ability kids go to university and their high ability would have led to them earning more anyway (ability is an omitted variable). Or, kids who expect high salaries in the future (e.g. due to being from well-connected families) are more willing to take on the debt to go to university today (reverse causality).
  • Showing that socially responsible firms perform better doesn’t mean that social responsibility pays off. It might be that, only once a firm is already performing well can it invest in social responsibility (reverse causality). Or, a forward-thinking management team (i) performs better, and (ii) gives thought to social issues (management quality is an omitted variable).
  • Showing that firms that cut investment subsequently perform badly doesn’t meant that cutting investment is bad. A McKinsey study makes the very strong causal claim to have found “finally, evidence that managing for the long-term pays off“. Their claim has been accepted as gospel by many, without recognising reverse causality – when a firm knows that its future prospects are poor, it should cut investment today. Presumably, this is what McKinsey advises its clients to do!
  • Showing that firms where a CEO has a high equity stake (owns a lot of shares) subsequently perform better doesn’t mean that equity incentives work. It might be that, when a CEO expects a firm to perform better in the future, she’s more willing to hold shares today.
  • Showing that a fad diet leads to weight loss doesn’t mean the diet caused weight loss. It might be that the desire to lose weight caused a person to choose the diet, and also to exercise more and it’s the latter that led to the weight loss (omitted variables).

The problem is even worse due to confirmation bias, as I explained in my recent TEDx talk, “From Post-Truth to Pro-Truth”. We jump to the conclusion that fits our view of the world.

  • Professors like me are all too eager to believe that our fascinating class is what got students that job.
  • We want to think that “nice guys finish first” – that responsible companies beat irresponsible ones.
  • Those whose view businesses as evil and self-serving will want to think that those who cut investment (to pay dividends or buy back stock) get their comeuppance later.
  • People like me who spend their lives studying on incentive compensation really want to believe that incentives actually matter, and that they’re not wasting their time.
  • Any proponent of a fad diet or slimming pill will claim they’re to thank for your six-pack abs.

We must be very, very careful about interpreting evidence as causal, when it only shows a correlation. Fortunately, there are now clever techniques to separate causality from correlation – (I) instruments, (II) natural experiments, and (III) regression discontinuity. This article aims to explain these techniques in simple language. But before starting, I must caution that these techniques are only valid in very rare cases. Some papers use one of the three “magic phrases” to try to claim that they have identified causality, and then back it up with as technical language as possible to give the aura of statistical sophistication and batter the reader into submission. Instead, as I’ll explain, you don’t need to be a statistical expert to see whether the authors are trying to pull the wool over your eyes. All you need is common sense. For each of these techniques, I have a “Reader Beware” section on what to look for. The intended audience for this post is practitioners, who might use academic research to guide policy or practice, so I will paint with a broad brush. For a more detailed academic treatment, please see Roberts and Whited (2012).

I begin by defining terms. We are interested in the causal effect of an independent variable (e.g. district size, degree) on a dependent variable (e.g. child performance, future income). A causal interpretation is only possible if the independent variable is exogenous (randomly assigned) – if university places were randomly given to some school leavers and not others, and those that went to university earned more, we could infer that the degree caused the higher salary. However, most variables are endogenous. They are not randomly assigned, but the product of something else – the dependent variable itself (expecting a high future income encourages you to get a degree today – reverse causality), or a third variable that also affects the dependent variable (high ability makes you more willing to get a degree – omitted variables).

I. Instruments

How do we solve the problem that the independent variable is endogenous? In a medical trial, you would randomly assign the independent variable (a new drug) by giving it to some patients (a treated group) and a placebo to others (a control group). But, we can’t do this in social sciences – we can’t force some firms to give their CEOs high equity stakes and others to give low equity stakes.

So, what we want is something as-good-as-random. This is an instrument – something that randomly shocks the independent variable, just like random assignment of a new drug. In the school district example, Hoxby (2000) used rivers as a shock to district size. In the U.S., school districts were formed in the 18th century, when crossing a river was difficult due to no cars and few bridges, and so districts very rarely crossed rivers. Hoxby found that school districts that were naturally smaller, due to rivers, exhibited worse performance. Since these districts were “randomly” assigned a small size, the results imply a causal effect from district size to child performance.

A valid instrument must be:

  1. Relevant. It must affect the independent variable of interest. Rivers are relevant, as they placed natural boundaries on district size.
  2. Exogenous. It must not affect the dependent variable except through the independent variable. Rivers are unlikely to affect a child’s performance other than through affecting district size. (Technically, this is referred to as “satisfying the exclusion restriction”; I will use “exogenous” for short).

To give an example of the ingenuity of some valid instruments:

  • Does a family firm perform better when it appoints a family CEOs rather than an external CEO, or worse due to nepotism? If family-run firms perform better, it could be due to reverse causality: if the firm is performing well, the owners will keep it within the family; if it’s not, they will need an outsider to fix it. Bennedsen, Nielsen, Perez-Gonzalez, and Wolfenzon (2007) use the gender of the CEO’s first-born child as an instrument. Gender is:
    1. Relevant: when the first child is male, family owners are more likely to pass on control to a family CEO than when the first child is female.
    2. Exogenous: it’s unlikely that the gender of a CEO’s first child will affect the performance of the family firm other than through affecting whether the next CEO is from within or outside the family.
  • Rather than studying whether firms actually have a family CEO, the authors predict whether firms will have a family CEO based on the gender of the first-born child. They found that firms with a higher probability of having a family CEO (due to having a male first child). Since whether a firm is predicted to have a family CEO is random – because the gender of the first child is random – this implies that family CEOs cause worse performance.
  • Does watching TV cause autism? If the correlation is positive, it may be that autistic kids watch TV more (reverse causality), or neglectful parents both abandon their kids to watch TV, and also cause autism (omitted variables). Waldman, Nicholson, Adilov, and Williams (2008) use rainfall as an instrument. Rainfall is:
    1. Relevant: rainfall causes kids to watch TV, since they can’t play sport outside.
    2. Exogenous: rainfall doesn’t cause autism other than through its impact on TV-watching (it doesn’t suddenly cause parents to be neglectful).
  • Rather than studying the actual number of hours of TV-watching, the authors predict TV-watching based on rainfall. They found that kids with higher predicted TV watching are more likely to be autistic. Since predicted TV-watching is random – because rainfall is random – this implies that watching TV causes autism.

Reader Beware

Often authors will claim causality by using the magic word “instruments” (or “instrumental variables”), when the instruments are actually invalid because they are not exogenous (it is relatively easy to find instruments that are relevant). A reader should ask the following questions:

  • Can the “instrument” affect the dependent variable other than through the independent variable? Let’s return to the earlier question of whether the CEO’s equity stake causes better future performance. We might use CEO age as an instrument for her equity stake, as older CEOs tend to have accumulated more shares. But, CEO age is not exogenous, since it might directly affect firm performance. Older CEOs might perform better as they are more experienced, or worse as they are entrenched.
  • What causes the instrument to vary to begin with, and could this factor also affect the dependent variable?  Even if CEO age did not directly affect firm performance (older CEOs are just as good as younger CEOs), whatever drives cross-sectional variation in age may do so. For example, trouble in the firm’s business model may lead to a firm retaining an old CEO, and also reduce firm performance.
  • Is the instrument a lagged variable? Some papers use last year’s independent variable as an instrument – in our setting, this would be the CEO’s equity stake last year. It’s relevant – last year’s equity stake will be linked to this year’s, since equity stakes tend to be stable over time. Surely it’s also exogenous – since it’s last year’s stake, it was already set in advance of this year? But, whatever causes this year’s stake to be endogenous also likely causes last year’s stake to be endogenous. Last year, the CEO could have forecast performance to be good this year, and so chosen to hold more shares.
    • This is also known as the “post hoc ergo propter hoc” (after this, therefore because of this) fallacy. Just because event Y follows event X, this does not mean X caused Y
  • Is the instrument a group average? Some papers use a group average as an instrument – in our setting, this would be the average equity stake among CEOs in the same industry as firm X. It’s relevant – if rival firms are giving their CEOs lots of equity, firm X must do so too, to remain competitive. Surely it’s also exogenous – the equity stake of other CEOs shouldn’t affect firm X’s performance? But, any endogeneity in firm X’s equity stake is simply soaked up at the industry level (see Section 2.3.4 of Gormley and Matsa (2014) for more detail). If the industry as a whole is performing well, firm X will perform well, and CEOs of other firms in the industry will gladly hold high equity stakes.
  • Are the authors up-front about their instruments? A tell-tale sign is when, in the introduction to a paper, authors say something like “we control for endogeneity using instruments and show that the results remain robust” without explaining what the instruments are until much later in the paper. Finding valid instruments is very difficult and it is the authors’ responsibility to explain what the instruments are and justify why they are relevant and exogenous. Not being up-front about what the instruments are suggests the authors may themselves not be sufficiently convinced about their validity, and so they bury them deep into the paper.

Even though some papers may claim to have statistically proven exogeneity, there is no valid test to do this. So, the best way to assess exogeneity is to use common sense – could the “instrument” (or whatever drives the instrument) affect the dependent variable other than through the independent variable? Note that no instrument will be completely exogenous and one can always spin stories to argue that it is not. For example, one could spin a story that rivers directly affect child performance, because when kids look out onto a river, they get inspired to be more creative. Ultimately, the reader must use common sense to see whether such stories are reasonable.

As an example of how authors might use complex technical language to overwhelm the reader into believing they have shown causality, consider the following extract:

“We reestimated our models using the xtabond2 procedure in STATA, which utilizes the generalized method of moments (GMM) model also known as system GMM. The xtabond2 procedure is designed for panels that may contain fixed effects and heteroscedastic and correlated errors within units, and employs first differencing, which instruments variables with suitable lags of their own first differences, to eliminate these issues and potential sources of omitted variable bias (please see Arellano & Bover, 1995; Blundell & Bond, 1998; Roodman, 2009). Furthermore, and importantly, xtabond2 also allows the ability to specify variables as endogenous to examine whether potential endogeneity is influencing findings.”

Sounds impressive, but when you strip back from the technical language, you see that the authors are using “lags” (i.e. last year’s variable – more precisely, the change in the variable from last year), which is generally invalid for the reasons discussed above. I use the above extract in no way to poke fun at this paper, but to stress that it’s common sense, not technical sophistication, that enables us to assess validity. Other complex terms that authors sometimes use to throw up smoke and mirrors include “dynamic panel VAR models” and “Granger causality”. The latter, despite its name, does not prove causality. It asks whether one variable predicts another, but this is the “post hoc ergo propter hoc” fallacy.

II. Natural Experiments

As discussed earlier, in social sciences, it is hard for the researcher to randomly assign treatments. A natural experiment is when firms are naturally (i.e., without the researcher having to do anything) divided into treated and control groups, for example if a law affects some firms but not others.

Bertrand and Mullainathan (2003) study whether takeover defenses worsen firm performance by entrenching CEOs and allowing them to coast. Their natural experiment is the adoption of state anti-takeover laws. Crucially, different states passed these laws in different years. Consider two plants located in New York, one of which belongs to a Delaware-incorporated firm and the other to a California-incorporated firm. In 1998, Delaware but not California passed anti-takeover laws. The Delaware-owned plant is affected by the law and part of the treated group; the California-owned plant is unaffected by the law and part of the control group.

Assume that, after 1998, we found that the Delaware-owned plant produced 2 (units of output) and the California-owned plant produced 7.  We might conclude that anti-takeover laws reduce output by 5. But, such a conclusion would be premature. Perhaps inefficient firms happen to incorporate in Delaware, and so the Delaware-owned plant was performing poorly even before 1998. Thus, it’s not the law that caused the Delaware-owned plant to perform poorly – it was performing poorly anyway. So, we must perform what’s known as a difference-in-differences analysis, which is best explained by the following (hypothetical) example:

Pre-1998 Post-1998 Difference
Delaware 8 2 -6
California 11 7 -4
Difference -3 -5 -2

Since the Delaware-owned plant is generally more efficient, it was already performing worse than the California plant pre-1998. The difference in their performance was -3 in the bottom row.  After 1998, the difference widened to -5. So, the difference-in-differences – the increase in the difference after 1998 – is -2, and so we can conclude that anti-takeover laws cause performance to fall by 2. Crucially, we use the pre-1998 difference in performance to control for the fact that Delaware-owned plants might be inherent different from California-owned plants.

We could also reach the same -2 conclusion by using the right-hand column, rather than the bottom row. The performance of the Delaware-owned plant fell from 8 to 2 after 1998 – a difference of -6. But, we can’t attribute this decline to the anti-takeover law, because many other events could have happened in 1998 that caused this fall – perhaps the economy went into recession in 1998. This is the role of the control group – the California-owned plant. We can use its difference in performance of -4 to measure the impact of other events that happened in 1998. The difference-in-differences is -2. So, we reach the same conclusion that anti-takeover laws cause performance to fall by 2.

Reader Beware

  • Are the treated and control groups trending in the same direction? The California-owned plant is only a valid control for other events that happened in 1998 if it is affected by the same events as the Delaware-owned plant. This is why Bertrand and Mullainathan use two plants located in New York – if the New York economy suffers a recession, it should have the same effect on both plants. If they had instead compared a plant incorporated and located in Delaware to a plant incorporated and located in California, the latter would not be a good control as Delaware may have suffered a recession in 1998 but not California. So, it is critical that the treated and control groups be trending in the same direction – the change in their performance post-1998 should have been the same if no law had been passed. This is known as the parallel trends assumption.
    • Note that we do not require the treated and control groups to be similar. In the above example, Delaware-owned plants are less efficient than California-owned plants. The level of their productivity is different pre-1998 -we only require the change or trend in their productivity around 1998 to have been the same had no law been passed. We can check this by checking the trends in performance of both plants for several years prior to 1998.
  • Was the natural experiment anticipatedIf the law change was anticipated, firms could respond in anticipation of the law. Then, a researcher might incorrectly conclude that the law had no effect – because the changes had already been made before the law got passed. Moreover, as Hennessy and Strebulaev (2016) show, anticipation may not only cause the measured effect to be weaker, but have the wrong sign.
  • Was the natural experiment exogenous? If firms could have lobbied for the law change, then it is no longer random whether a plant is treated or a control. Perhaps Delaware-incorporated firms knew that their future prospects were poor and lobbied legislators to pass anti-takeover laws in anticipation. As a result, we cannot conduct natural experiments using changes implemented by firms (as some papers do). For example, conducting a “difference-in-differences” between firms who chose to engage in stock splits and firms that do not, would not allow causal inference, since firms endogenously choose whether they are in the treated group (those who split their stock) and whether they are in the control group (those who don’t).

III. Regression Discontinuity

Here, randomness occurs due to the independent variable falling either just below or just above a cutoff in an unpredictable way. For example, Cunat, Gine, and Guadalupe (2012) study the effect of shareholder proposals to increase shareholder rights. Showing that firm performance improves after such proposals are passed does not imply that the proposals caused the improvement, because they are endogenous. Perhaps a large engaged blockholder made the proposals, and it could be the blockholder that improved firm performance. So, they compare proposals that narrowly pass (with 51% of the vote) to those that narrowly fail (with 49% of the vote). Whether the vote narrowly passes or narrowly fails is essentially random, and uncorrelated with other factors such as the presence of blockholders – if there were large blockholders, they would likely increase the vote from 49% to (say) 80%, not 51%. They compare the stock price reaction to the vote outcome, as well as changes in long-term performance, of firms where a shareholder proposal narrowly passes to firms where a shareholder proposal narrowly fails (similar to a difference-in-differences). Since the stock price and long-term performance improves significantly more for the former set of firms, they show that increased shareholder rights cause higher firm value and long-term performance.

For other examples of regression discontinuity that I have blogged about, see Flammer and Bansal (2017) on the effect of shareholder proposals to implement long-term incentives, and Malenko and Shen (2016) on the effect of proxy advisors on voting outcomes.

Reader Beware

  • Can firms perfectly manipulate the independent variable, i.e. choose whether they are above or below the threshold? Suppose directors have control over the votes of shares held in an employee benefit trust. Normally, they do not vote these shares, to avoid investor concerns about them distorting vote outcomes. However, in extreme conditions, they may. For close votes, control of these votes allow firms to essentially choose whether the vote is 51% or 49%. They might allow the proposal to pass if it is performing well (since it is not afraid about greater shareholder power), and cause it to fail if it performing poorly. Then, whether the proposal passes or fails is endogenous – it depends on firm performance.
    • Note that if firms can only partially (not perfectly) manipulate the vote, regression discontinuity is still valid as there is still some randomness as to whether the vote narrowly passes or narrowly fails.
  • Are firms comparable on other dimensions above and below the threshold? Firms above the threshold are treated and firms below are controls. The treated and control firms should be comparable on all other dimensions. Comparability might be violated if (hypothetically) firms with higher-quality management were able to predict when the vote is going to be close and persuade “swing” shareholders to vote against the proposal. Thus, management quality might jump when you move from above to below the threshold.

IV. An Alternative Technique: Common Sense

Finding valid instruments, natural experiments, and discontinuities is difficult. So, an alternative approach to get closer towards causality is to use common sense. For example, if your effect is indeed causal, it should be stronger in certain circumstances. If a higher CEO equity stake caused superior firm performance, through providing the CEO with better incentives, the effect should be stronger where CEOs have greatest freedom to slack – in firms with little ownership by institutional investors, poor governance, and low product market competition. This is what von Lilienfeld-Toal and Ruenzi (2014) show, as blogged about here.

Brav, Jiang, Partnoy, and Thomas (2008) show that, after hedge fund activists acquire a large stake in a firm and announce an intention to influence control, performance improves. There could be reverse causality if the hedge fund predicted the improvements and acquired the large stake in anticipation. As blogged about here, the authors support causality by showing that the improvements are stronger when the hedge fund employs hostile tactics, and remain significant even when the hedge fund already had a large stake prior to announcing its activist intent.

Note that common sense does not show causality as cleanly as the first three methods; it can only suggest causality. (In the first example, perhaps the measures of governance are inaccurate). But, it should be added to the toolkit. Just as a discerning reader should use common sense to avoid being impressed by complex, but invalid, statistical techniques, he/she should also be open to common sense approaches to suggesting causality, even if they cannot prove it. Researchers using this approach must be careful not to make strong causal claims.

House of Commons Report on Corporate Governance

Today the House of Commons Select Committee on Business, Energy, and Industrial Strategy (BEIS) published its report on corporate governance, after extensive consultation of oral and written testimony from a wide range of stakeholders. I applaud the Select Committee for such an extensive, thorough job with an issue of national importance, and am personally grateful to them for publishing my initial and supplementary written testimonies as well as inviting me to testify orally in Parliament. I endorse the vast majority of the recommendations and believe that they will help “make Britain a country that works for everyone”, in Prime Minister May’s words. This post aims to summarize the 81 page report into a few simple bullet points, and link them to the evidence.

Executive Pay

  • LTIPs (bonuses based on hitting financial targets) to be scrapped from 2018; no existing LTIPs to be renewed.
  • Instead, give executives equity that they are required to hold for the long term (at least 5 years). The equity must not vest (= become saleable) all in one go
    • See here for the arguments for replacing LTIPs with equity, and here for evidence that CEOs cut investment when their equity vests
    • These ideas are also advocated by The Purposeful Company, a leading consortium of leading executives, investors, consultants, and academics (full report here, short summary here)
    • LTIPs are almost ubiquitous, but used because “we’ve always done it that way” rather than because they are effective. Given this common usage, the proposal is a radical one – but a highly desirable one – and I greatly applaud the Committee for its boldness
  • Where bonuses are used, they should be on wider performance criteria (e.g. qualitative factors) and must be stretching
  • Shareholders’ “say-on-pay” vote will remain advisory, rather than being changed to binding (as initially mooted).
    • However, if an advisory vote has < 75% support, there should be a binding vote the next year and the Remuneration Committee (RemCo) chair should be encouraged to resign
  • Firms should not be forced to put workers on RemCos, but worker representation to be on a comply-or-explain basis
  • Firms, public sector, and large third-sector organisations to publish pay ratios between the CEO and senior management, and the CEO and all UK employees. The ratio must be on a consistent basis each year
    • The actual advocacy of pay ratios was lukewarm, with little justification given. See my Harvard Business Review article for the potential unintended consequences of such disclosure (including for workers themselves).

Directors’ Duties and Reporting

  • More specific and accurate reporting on directors’ duties to other stakeholders, including long-term consequences of decisions
  • Reporting to contain fewer boiler-plate statements. Companies to be more imaginative and agile in communicating directly with stakeholders
  • The report recognises that UK corporate governance is very well regarded internationally. Thus, it strongly supports maintaining
    • The unitary board, where all directors share the same responsibilities
    • The statement of directors’ duties in Section 172 of the Companies Act (that directors “promote the success of the company for the benefit of its members” (i.e. shareholders) while having regard for other stakeholders)
    • “Comply or explain” guidelines (firms do not need to comply with certain guidelines, permitting flexibility – but if they do not, they must explain why not)
  • I particularly applaud the report’s caution against overreacting to the scandals at BHS and Sports Direct. These scandals are tragic, but do not mean that all companies should have to suffer.
    • The Report writes (paragraph 24): “Corporate governance in the UK is still strong and remains an asset to the country’s reputation for doing business. We are conscious that a small number of highly damaging examples of corporate governance failure should not lead to a hasty and disproportionate response. We do not believe that there is a case for a radical overhaul of corporate governance in the UK”

Expanded Role for the Financial Reporting Council (FRC)

  • FRC to introduce a new tiering system (Red, Yellow, Green) for corporate governance
  • FRC to engage and hold directors to account
    • If engagement unsuccessful, report failings to shareholders
    • If still no response, take legal action
  • FRC to be renamed and resourced, to match this expanded role

Private Companies

  • New governance Code for the largest private companies to be developed
    • Compliance to be examined by an expanded FRC, funded by a small levy on businesses

Shareholder Engagement

  • Paragraphs 13-16 recognise the importance of blockholders (large shareholders) and the dangers of the ownerless corporation
    • However, this point is not subsequently picked up. Encouraging large shareholders to form, and helping shareholders to engage with companies, could further help the Government’s mission. See Chapter 4 of The Purposeful Company Policy Report.
  • Investor Forum to facilitate better engagement between boards and shareholders, particularly if rated Yellow or Red by new FRC tiering system
  • Shareholders encouraged to engage more in pay

Stakeholder Representation

  • Companies to be encouraged to consider a Stakeholder Advisory panel, to consult stakeholders other than shareholders
  • Annual Report to contain a section on how firms engage with shareholders
  • Workers on boards should not be mandated, but report highlights that there is nothing in the law to prevent it. Would like it to become the norm by opening up new director positions to all.
    • Worker directors will not be a delegate of the workforce as a whole but act in their own capacity, and have the same rights and responsibilities as other directors

Board Diversity

  • 2020 target for half of new appointments to senior and executive management to be women. Companies should explain why they have failed to meet the target and the steps taken to address it
  • Every existing FRC reference to gender diversity should also add a reference to ethnic diversity


  • Firms to report on their people policy in the Annual Report, i.e. approach to investing in people and how they ensure that their pay and working conditions are reasonable
  • Investors to disclose voting records; FRC to name those who don’t vote
  • Firms to provide full information on advisors engaged in transactions

A Note on the Use of Evidence

  • The Report writes “The TUC states that “There is clear academic evidence that high wage disparities within companies harm productivity and company performance“.” This statement is actually false. The TUC (potentially inadvertently) quoted an unpublished 2010 paper by which found that high pay ratios are negatively correlated with firm performance. However, the final version of the paper was published in 2013 (i.e. 4 years ago). After going through peer review, it found the opposite result. In the authors’ own words, “We find that employees do not perceive higher pay ratios as an inequitable outcome. We do not find a negative relation between relative pay and employee productivity. We find that firm value and operating performance both increase with relative pay.”
    • This highlights the potential issue of “confirmation bias”. You can always find some academic paper to support any viewpoint (some studies support vaccination, others oppose it). So, just having “evidence” to support a viewpoint means little – what matters is the quality of evidence. One cannot just hand-pick an unpublished draft that shows what you would like it to show, particularly when the published version shows the opposite.
    • Claiming to be unaware of the published paper is not an acceptable defense. It is incumbent upon a witness, who chooses to quote an unpublished paper, to check whether it has since been published. Confirmation bias is not only misinterpreting evidence once you have received it, but the failure to search for new evidence. One cannot just stop at finding a half-finished paper because it shows what one would like it to show, and not bother to see if there is a finished version
    • I highlighted in my supplementary testimony that the result was overturned (and the US evidence was independently confirmed using UK data in a paper forthcoming in a top journal). Thus, while the bulk of the Report is balanced and well evidenced, it is surprising that it contains a statement known to be wrong. The Oxford Dictionaries word of 2016 is “post-truth”, which has led to a widespread, and very welcome, acknowledgment of the importance of correcting untruths. Thus, when such corrections are made, they should not be ignored.
    • As stated in my supplementary testimony, “The goal of the above is absolutely not to discredit the TUC, which is an organisation I respect, and whose goal of encouraging ethical treatment of workers I very much share. [Indeed, I expect that we both share strong support for the Committee’s recommendation for firms to disclose their people policy.] This is simply intended to be one example of how important it is to be critical with evidence.”
    • Moreover, that the paper finds that pay ratios are positively correlated with future performance is far from the final word. Academic evidence is only one input into a decision. My concern is only that, when evidence is quoted, it should be quoted accurately.
    • I will discuss best practices for the use of evidence in my upcoming TEDx talk, “From Post-Truth to Pro-Truth”, on 12 May in London. See here for details of the event and excellent other speakers.

Long-Term Executive Incentives Improve Innovation and Corporate Responsibility

Executive compensation needs to be reformed. But, most of the calls for reform focus on the wrong dimensions. They focus on the level of pay, or the ratio of executive pay to median worker pay – even though the evidence suggests that low ratios are linked to lower future performance. As I have argued in the Wall Street Journal and World Economic Forum, the most important dimension is the horizon of pay – whether it depends on the short-term or long-term.

We certainly want executives to act in the interest of society, and for a more equal society. But, the way to increase equality is not to bring CEOs down, but to induce them to bring others up. Treating stakeholders (workers, customers, suppliers, the environment) well is costly in the short-term, but the evidence shows that it pays off in the long-term. So the best way to encourage purposeful behavior is not to scrap equity incentives (thus decoupling pay from performance), but extend the horizon to the long-term.

The trouble is that it’s hard to find causal evidence of the effects of long-term compensation. This is because long-term compensation is not randomly assigned. If long-term compensation were correlated with superior long-term performance, it could be that incentives caused good performance – or that executives who knew that long-term prospects were good were willing to accept long-term incentives to begin with. An excellent paper shows that total stock ownership is associated with superior future performance, and the relationship is likely causal, but they do not look specifically at vested stock ownership, not restricted stock ownership which the CEO is forced to hold for the long-term.

An insightful new paper by Professors Caroline Flammer (Boston University’s Questrom School of Business) and Pratima Bansal (University of Western Ontario’s Ivey School of Business) addresses this causality issue. It studies shareholder proposals that not only are on executive compensation, but specifically advocate the use long-term incentives (rather than advocating, say, cutting pay) – restricted stock, restricted options, or long-term incentive plans. However, simply looking at all proposals wouldn’t get round the causality issue. It could be that shareholder proposals arise due to a large engaged blockholder, and it could be the blockholder – not long-term compensation – that improve future performance. So, Caroline and Tima use a “Regression Discontinuity Design”. They compare proposals that narrowly pass (with 51% of the vote) to those that narrowly fail (with 49% of the vote). Whether you narrowly pass or narrowly fail is essentially random, and uncorrelated with other factors such as the presence of blockholders – if there were large blockholders, they would likely increase the vote from 49% to (say) 80%, not 51%.

They find that proposals to increase long-term compensation improve long-term operating performance, regardless of whether you measure it using return on assets, net profit margin, or sales growth. Interestingly, operating performance decreases slightly in the short-run, highlighting the fact that long-term orientation requires short-run sacrifices. But, the long-run benefits outweigh the short-term costs – firm value rises overall.

What’s the mechanism by which this happens? Skeptics might think that long-term CEOs might fire their workers – this is costly in the short-term (due to severance pay) but saves wages in the long-term. This is not the case. There are two channels, both of which are beneficial to society – so the rise in firm value is also socially optimal:

  • Innovation improves. Firms increase R&D. Moreover, they are not simply spending money blindly – this higher R&D expenditure leads to
    • More patents
    • Higher-quality patents (measured by citations per patent)
    • More innovative patents (measured by the distance from firms’ existing patents)
  • Corporate responsibility improves. Firms’ CSR ratings improve significantly, as measured by KLD ratings of a firm’s stewardship of four stakeholder groups: employees, the environment, customers, and society at large. The effects are strongest for employees; one of my own papers shows that employee satisfaction in turn improves firm value

The goal of any pay reform should be to act in the long-term interests of society.  Leading compensation expert Kevin Murphy forcefully argues that politicians’ desire to cut the level of pay is not driven by social considerations (given there is no evidence that cutting pay levels improve behavior), but jealousy and envy, or the desire to appear tough. Ironically, despite emphasizing the importance of thinking long-term, politicians’ proposals to regulate the level of pay are incredibly short-term. There will be an immediate gain in public approval from appearing tough, but the long-term benefits of instead making compensation more long-term are much more important.

As an aside, Caroline previously used Regression Discontinuity in an excellent paper, published in Management Science, which shows that CSR proposals (again, those that pass by a small margin) significantly increase shareholder value and profits. This is a powerful result, since many naysayers argue that CSR is at the expense of shareholder value. Instead, businesses and society are in partnership with each other, not in conflict. As I argued in my TEDx talk, “to reach the land of profit, follow the road of purpose”.


Does corporate social responsibility improve firm value?

Below is an article I wrote two months ago for the World Economic Forum. Since it’s posted on the password-protected section of the WEF website, I reproduce it here.

Does corporate social responsibility (“CSR”) improve firm value? When companies make decisions, should they care only about shareholders or should they take other stakeholders (e.g. employees, customers, the environment) into account? This is a decades-old debate, but despite many cogent views on both sides, there’s surprisingly little hard evidence.

In 1970, Milton Friedman famously wrote that “the social responsibility of business is to increase its profits”. This view isn’t as hard-hearted as it may sound. Friedman argued that a company can only increase its profits by taking other stakeholders into account – producing high-quality products, treating its employees fairly, and having a good environmental reputation. Under this view, firms should focus exclusively on profits, and everything else will fall into place.  Considering other stakeholders beyond the profit implication is at the expense of shareholders: a dollar spent on reducing pollution (beyond the level that will avoid an environmental lawsuit) is a dollar that cannot be paid as dividends.

However, advocates of CSR argue that the Friedman view only holds in theory. In practice, it’s extremely difficult to quantify the profit implications of most socially responsible actions. A company could decide whether to grant an employee compassionate leave by trying to calculate the potential loss in morale and productivity if the leave was withheld, but these consequences are very hard to quantify. The CSR approach would be to grant the leave simply because it’s the right thing to do – because the goal of the company isn’t only to maximise profits, but to treat stakeholders with compassion. Treating employees fairly will eventually manifest in greater staff retention and future productivity. However, these long-run effects are difficult to quantify, so a firm focused exclusively on profits will not invest in its stakeholders.

Whether CSR improves firm value has been studied extensively by management scholars. Most studies find a positive correlation between CSR and measures of firm performance, such as profits. However, correlation doesn’t imply causation. It may not be that CSR causes a firm to perform better, but instead that firm performance causes CSR – only firms that are performing well can afford to spend money on its other stakeholders. In addition, some studies consider only one industry, or a short time period, and so are hard to generalize.

I decided to tackle this long-standing management question using a methodology from a different field – finance. This approach involves linking CSR not to profits, but to future stock returns, which reduces reverse causality concerns. If it was high profits that caused CSR, then the high profits would mean the company’s stock price would already be high today, and so we shouldn’t expect higher stock returns going forward.

The next decision is how to measure CSR. The main challenge is that CSR is extremely difficult to measure objectively, as it’s intangible. Tangible measures do exist – for example, one could measure workplace diversity by whether there’s a minority on the board. However, tangible measures are relatively superficial and thus easy to manipulate. For example, a company that cared little about workplace diversity could put a token minority on the board to “check the box”. A separate challenge is that CSR comprises of many different dimensions – responsibility to employees, customers, the environment, etc, and it’s unclear how to weight these different constituencies.

I thus focused on one particular dimension of social responsibility – employee satisfaction. I chose this dimension as a very thorough measure of it exists. Since 1984, there has been a list of the “100 Best Companies to Work for In America”.  This list is compiled by surveying the employees themselves – it’s the ultimate in fundamental, grass-roots analysis. Two hundred and fifty employees are randomly selected in a firm and asked 57 questions on various aspects of employee satisfaction (credibility, respect, fairness, pride/camaraderie), which had been developed through extensive discussions with managers, employees and workplace experts. As a result, it’s arguably the most respected measure of employee satisfaction.  Equally importantly, it has been available since 1984, and thus I have a long time-series which comprises both recessions and booms.

The first list came out in a book in March 1984, then another book in February 1993, and then in the January edition of Fortune magazine every year from 1998.  My methodology involves buying a the Best Companies in April 1984, rebalancing the portfolio in March 1993 to take the new list into account, and then rebalancing it every February from 1998.  The one month delay is because I wish to test not only that employee satisfaction improves firm value, but also whether the market recognizes this link.  Even if employee satisfaction improves firm value, my strategy should earn no returns if the market recognizes this link.  As soon as a company appears in the Best Companies list, its stock price should go up, so I shouldn’t be able to generate returns by buying it one month too late.

I compare the returns of the Best Companies not only to the overall market, but also to companies in the same industry.  For example, Google is frequently in the Best Companies list, but its high returns could be due to the tech industry doing well, rather than its employee satisfaction.  I also compare each company to peer firms with similar characteristics (e.g. size, dividend yield, recent performance, valuation ratios).  In short, I try to control for as much as possible, to isolate the effect of employee satisfaction.  I also remove the effect of outliers, to ensure that any superior performance of the Best Companies isn’t due to a few star performers such as Google.

I find that the Best Companies beat the market by 2-3%/year, over a 26-year period from 1984-2009.  This outperformance is highly statistically significant, and also economically meaningful – a fund manager who beats the market by 1%/year for 5 years is considered to be skilled.  Moreover, this outperformance is based on a very simple trading strategy using public information on large firms.

The results have three main implications.  First, they suggest that employee satisfaction is beneficial for firm value.  While it may seem natural that companies should do better if their workers are happier, this is far from obvious.  Indeed, the 20th century way of managing workers is to view them as any other input – just as manager shouldn’t overpay for or underutilize raw materials, they shouldn’t do so with workers. High worker satisfaction may be a sign that workers are overpaid or underworked.  However, the world is different nowadays.  Human capital is the main asset in many firms, and employee welfare can improve productivity, retention, and recruitment.

Second, even though employee satisfaction may be beneficial in the modern firm, the market doesn’t recognize this link. Even though I wait a month before forming my portfolios, the strategy generates superior returns.  Similarly, the Best Companies typically report earnings that beat analyst expectations – analysts aren’t aware of the benefits of worker welfare.  Indeed, I show that it takes 4-5 years before the market fully incorporates the value of employee satisfaction.  This may be because traditional methods of valuing companies are based on the 20th century firm, and emphasize tangible factors such as short-term profits.  This result has broader implications for firms’ incentives to invest for the long-run.  If investors continue to value companies based on short-term profit, then managers will pursue short-term profit rather than long-run growth.

Third, Socially Responsible Investing (SRI) – incorporating social considerations into portfolio choice – can add value.  The traditional view is that SRI is costly to investment performance, as it involves screening out good investments and screening in bad investments.  However, the Best Companies strategy generates high returns while supporting companies who treat employees responsibly – investors can do well and do good.  This result is a consequence of the first two implications – employee satisfaction is beneficial (the first implication) but the market doesn’t recognise that it’s beneficial (the second implication).

In concluding, it’s worth highlighting some caveats to my study.  First, I’ve only shown a link between stock returns and employee satisfaction, and not other dimensions of CSR.  Further research must be done to study whether there’s any link with environmental protection, animal rights, etc.  However, since the traditional view is that no dimension of CSR should add value, the results are an important first step towards demonstrating the benefits of CSR more broadly.  Second, while I control for many observable factors (industry performance, firm size, dividend yield, etc.), I can’t rule out the explanation that an unobservable variable (e.g. good management) causes both employee satisfaction and superior returns.  If so, my first implication is no longer causal – improving employee satisfaction (without changing management) won’t improve stock returns.  However, the other two implications remain.  It remains the case that the stock market misvalues intangibles – just that the intangible being misvalued is good management rather than employee satisfaction.  It also remains the case that a socially responsible investor could have bought companies that treat their employees well and earned superior returns.

Further reading:

Edmans, Alex (2011): “Does the Stock Market Fully Value Intangibles? Employee Satisfaction and Equity Prices”. Journal of Financial Economics 101(3), 621-640

Edmans, Alex (2012): “The Link Between Job Satisfaction and Firm Value, With Implications for Corporate Social Responsibility.” Academy of Management Perspectives 26(4), 1-19