A policy evaluation account. Interview with Jochen Kluve.
Jochen Kluve is Professor of Empirical Labor Economics at the School of Business and Economics, Humboldt-Universität zu Berlin, and Head of the Berlin Office of RWI, a center for scientific research and evidence-based policy advice. Since 2009, he is a member of the advisory committee on active labour market policy evaluation projects at Ivalua.
In the midst of the economic crisis Catalonia has reached an unemployment rate of 20%, doubling the EU average, and pushing the debate on Active Labour Market Policy (ALMP) back to the top of the agenda. Moreover, the design of ALMP has recently been decentralized to Autonomous Communities, thus opening a window for policy reform and innovation. What should be done? What is the evidence on what works in ALMPs?
From the available evidence, probably what helps an unemployed person best in the short run are the job search assistance programmes. So, when a person becomes unemployed the first step would be to carry out intense screening, profiling, trying to figure out what qualifications the person has, what potential avenues they should take, help him learn how to look for jobs, how to write applications, force her into regular meetings on a weekly, biweekly basis; basically, everything that’s labelled “activation”. That would typically be the first step. It does not make sense to start with programmes right away, because essentially what really helps people in this first period is to give them structure in their job search, try to get an idea of the case, the qualifications and what is right for them. A good example of that would be the “Gateway Period” of the New Deal programmes in the UK; that’s essentially what you start with.
When do programmes come into the picture?
Activation is the first thing to do, and then as people do not leave the register, i.e. people do not find a job through this first step, then you have to try to figure out what the right thing is to do with them next. Available evidence suggests that what seems to work reasonably good are wage subsidy programmes in the private sector. However, I am afraid we do not know too much on how sustainable the job matches that they are creating actually are; how long those people actually stay in the positions once the subsidy runs out, and how we can avoid any substitution and displacement effects; there is always a certain probability that it is all just a windfall benefit, so companies hire people they would have hired anyway, but now they also take the subsidy for doing it. Of course, perhaps participants stay on afterwards, but if it is something that would have occurred anyway, it is still public money lost. That depends a lot on which specific country or local labour market we are talking about. So regarding the wage subsidies, existing work has identified positive effects in general; but more research is certainly needed. There still aren’t that many papers analysing this type of programme, despite the fact that it seems to be a popular policy.
What about training programmes?
Basically, trying to enhance human capital, trying to increase people’s qualifications in order to improve their employment chances seems a sound idea, and the literature has mostly shown a positive albeit modest effect. There are quite some interesting patterns that research has shown regarding training programmes. First, training works better in the long run, so if you put the picture together it makes sense: first you activate people, you tell them “OK, you have to show up once per week, write ten applications per week, etc” and like that you get part of the people moving into jobs. Then you have to consider what to do with those who do not find jobs through activation, and that is where the idea of providing qualifications, skilling them, re-skilling them -depending on what they have learned and done so far- comes in. And this is something that can bring about benefits that materialize years later. Second, in a recent analysis for Germany we looked at continuous training programmes during which people are trained for between one month and thirteen months. We analyse whether there is a pattern in the length of the treatment, since some people are assigned treatment lengths of two months, others six months, etc. The interesting pattern that we find is an increasing effectiveness until around 100-120 days, and then it essentially flattens out. So it seems that over the first four, five… maybe six months, the effect increases, but putting additional effort into qualifying people does not seem to pay off. This pattern suggests that during the first six months it makes sense to invest in training but then the following six months do not seem to make much of a difference anymore.
Is there any evidence concerning what type of training is most useful?
The training evaluation literature often provides little detail about what participants actually do. This is because, for instance, administrative data used for academic research may only contain information of the type “person X was in some training programme during a given month Y”, but does not specify which training. Or a researcher may not take the trouble to find out what type of training people actually did. So, typically in the literature we see merely a rough classification whether it was classroom or on the job training. Hence, it is often more of a plausibility issue; you would think that it makes most sense to combine some type of classroom training with some on the job training or work experience. But it is difficult to derive any general truths. One thing that is definitely important is to combine training participation with some form of accreditation, which could be even a degree in the case of long re-skilling programmes. First, this creates incentives for participants to finish the programme, and it is a signalling device to potential employers.
Public employment schemes, such as subsidies to create jobs in public and non-profit sectors, are also a widely used ALMP
Indeed, though this seems to be a bad idea. It is probably not surprising that evaluation results turn out to be poor when you are just giving people jobs without any idea of increasing their human capital. This bad performance of public employment programmes is a consistent pattern across countries. The evaluation of the Catalan employment plans, for instance, has shown that these schemes can easily derive into an in-kind subsidy for local governments and non-profits, which end up hiring the young, rather qualified candidates that best fit the position, instead of the unemployed which are most in need of regaining contact with the labour market. Hence, there is an international trend to abandon or trim down this kind of ALMP, and keep it only as a sort of passive policy of last resort for the most disadvantaged and difficult to place in the labour market.
Another major concern in Catalonia is youth unemployment and the large group of youngsters that neither study nor work. Concerning the specific group of young, unemployed people with low qualifications, is there any type of ALMP that has shown good results?
We see from many evaluation studies that ALMP programmes do not seem to work for the young. That probably has a lot to do with the fact that once you are 25 and long-term unemployed, it is probably too late; probably not too late for life, but it is very difficult for a training programme to make a difference if at age 25 a person has already spent 2, 3, or 5 years unemployed, probably even after having dropped out of school, or with only basic educational attainment anyway. So the policy issue does not really regard ALMP, but rather something kicking in at a (much) earlier stage. Essentially the first policy issue is to think about how to decrease the drop-out rate, and once that has been addressed, the next issue is what to do with youngsters once they are 16 and graduating from school, and how to get them into some type of vocational education. You could potentially trace this logic further “down” the age of a person, until the policy issue is basically how to design the entire schooling system starting with early childhood education.
So is there anything that ALMP can do for young people without employment?
Yes, to some extent. Like I mentioned earlier, the “New Deal for Young People” in the UK would be a blueprint for this. First you have an activation period, or a “gateway period”, during which a case worker has to establish what problems the youth is facing, what the person has done so far, what his or her qualifications are, what jobs they could lead to, and get them moving. Then, if that does not work, the New Deal offers one of four options: further training and education, public employment in a general job or in an “Environmental Task Force”, or subsidised employment. And you choose the option in response to the particular needs of a given person. This information you have from the profiling during the Gateway stage. And this ideal procedure “First, Gateway; second, tailor-made ALMP” is the same for youths and adults alike.
Another issue that is currently gaining importance is the link between active and passive policies, which is something which many countries have already established, and Spain is just starting now. In general, what do you think should be taken into account when establishing that link?
Establishing this link is important. Until the 1980s or the 1990s many Western countries had systems involving open-ended benefits, with a very weak connection between benefit receipt and compulsory programme participation or sanction elements. This has changed significantly. In Germany nowadays, for instance, if you have worked and contributed to unemployment insurance and you become unemployed then you get a Type 1 benefit for up to, typically, one year –there are certain exceptions, so it could also last up to 18 months, but typically it will be for one year – and then you move on to benefit Type 2, which is means-tested and no longer connected to what you previously earned. These are basically welfare benefits; they also depend on what your family situation is. Moreover, there are certain sanction elements that are mostly connected to search behaviour, but to a lesser extent also to participation in ALMPs.
Would you recommend the German model?
Well, in this matter we have learned from other countries. Denmark is really the blueprint for the “flexicurity” model in which employment protection is low and people can lose their jobs relatively easily but then during the first year of unemployment they receive high benefits, and once they move towards the end of the entitlement period, i.e. one year, then they are forced to participate in some kind of ALMP. You can see that exit rates increase as people move towards the end of that period. What is most important is that there are no exceptions, so sanctions seem to be a credible threat. Therefore, people are typically unemployed for a maximum of 12 months and then they go into a programme. I think in Germany we have come a long way in changing the system, so that has certainly been a good thing- type 1, type 2, sanction elements, etc. But in practice there are still certain problems, there are still certain exceptions; for instance, older people can receive benefits for a longer period, so there is some give and take.
You need incentives and sanctions for the system to work.
Yes, on the one hand the scheme depends on the sanction elements and how to force people into doing something and, on the other hand, it depends on the rewards for that “doing something”, on the incentives to actually take up work. Essentially you need the right balance between “carrots and sticks”.
One of the reforms that the Spanish Government is currently considering is to commission the delivery of ALMPs to the private sector, specifically with regard to job search assistance.
This is not necessarily a good idea. It seems to me that the whole issue of contracting out always seems to come with a prior that contracting out must be a good thing; that it must be better than having it publicly provided, and I’m not sure whether that is true. I think the public administration can be efficiently organised, and I think from the perspective of the client the important thing is to combine everything into a “one-stop service”; you come and then typically you have just one institution that you need to talk to, that assists you with your benefits and with other things that you need, such as an ALMP. I’m not sure that a strong case can be made for contracting-out, given that you would want to have an integrated service, and a case worker who is familiar with a person’s needs, and who does the screening and profiling. I would say that the kind of good gateway service, the screening and profiling, should be publically provided, and then the training and other programmes can be contracted out.
The rationale goes that contracting out would induce competition and the best performing firms will be awarded contracts and funds. But is it really possible to build such a market?
It is possible, but then you would need some kind of means of how to measure success. So if a private placement firm says “we have placed these people” you need to find out what type of placements they are in. I understand that that kind of market can work, but then what are the incentives for the firms? They need to create placements, maybe not caring too much about the real needs of the worker, like “is he looking for a job?”, “is the placement actually sustainable?”, “is it a well paid job and the person has some perspective in the medium or long run?”. It is possible to create a market for this type of services but you also have to take into account what the providers’ incentives are.
Another issue on the agenda is how to bring together unemployment benefits and social assistance, which sometimes serve the same population.
In Germany, for instance, this has been a tricky issue. Before the Hartz reforms, which included the reorganisation of the employment services, the Federal employment offices in the municipalities administered both the unemployment benefit, which was contribution-financed, and the unemployment assistance, which was a long-term unemployment benefit. And there was a parallel structure, run by the municipalities, which offered social assistance, which was very similar to the unemployment assistance. The social assistance typically catered to the harder cases, whereas the unemployment assistance was more focused on people who just did not find jobs, but they remained very similar. In this regard, the Hartz reforms wanted to overcome these two parallel structures that essentially served a very similar population, and to combine the unemployment assistance and social assistance. Therefore, we created the Type 1 benefit, which is still contribution-financed and replaces the unemployment benefit, and the Type 2 benefit, which is means-tested and comes from tax money and which combines and replaces both unemployment and social assistance. But, since in the old system the unemployment assistance was administered through the employment offices and the social assistance was administered by the municipalities, we needed to find a way for them to collaborate. Therefore, the government created the so-called Arbeitsgemeinschaften, or collaborations, where they were supposed to cooperate. However, the opposition wasn’t happy with that and neither were some of the municipalities, so the law was modified in such a way that we ended up with so-called “opting out municipalities”, who can say “we don’t want to cooperate because we think that we can do it better alone than if we do it together with the Federal employment offices”. That is why we have this setup of roughly 350 collaborations and 69 “opting out” municipalities. A recent evaluation showed that there is something to be gained from the collaborating structures, because of efficiency gains of centralisation and the input they get from the Federal Employment Agency.
Now that you mentioned the Hartz reforms, our understanding is that compulsory evaluations were prescribed. Did that work well? Did you manage to influence the design of ALMP?
Yes and no. The good thing is that it was the first time in Germany that it was really stated in the law that an evaluation was needed, so that was certainly a welcome development, and the outcome of the evaluation was really made public, including a conference to present the results. As for the impact of evaluations on the policy design, that is when the answer is “yes and no”, since some of the recommendations that came out of that evaluation were taken up and things were changed, and others were not. That is when the political process starts, and as a researcher you have no further influence on what happens. That was about five years ago and there were certainly some reasonable recommendations that were made in that evaluation, which have still not been implemented, which are still on the agenda, and it still makes sense to implement them, but as politics go it just has not happened. But other things were changed. For instance, there was some type of temporary work agency which was run by the employment offices and which did not show good results in the evaluation, and that was actually abolished. Another example is that there were very bad evaluation results on public employment creation schemes, and those programmes too were largely abolished. But, of course, it also went the other way: there was a start-up subsidy programme that was shut down despite the good results that it showed in the evaluation.
What does it take to convince lawmakers and policymakers of the importance of evidence input to make decisions?
For one, it can now be shown from examples from other countries that their experiences with evaluations are positive, that they learn how to redesign their policies. It seems to me -I do not know if it is true for Spain, but I know it is true for Germany- that the core problem of why policymakers do not like evaluation is because there is this prior that they think “if I evaluate a programme, it will come out badly, so I cannot evaluate”. That, of course, tells you something about what they think of their policy in the first place. So they only see the danger involved; there is an imbalance between the danger they see in the possibility that the evaluation comes out with a bad result, relative to the chance that they really can learn something about how the programme works, how to redesign it, and that they can potentially come up with an academically-proven result that says “yes, that was a good thing you did, and we can show it was a good thing”. And then, in practice, another thing that sometimes does not help is that politics does not have the time to wait for results.
In this regard, you spent some time in the USA and are currently working in Germany. Have you noticed any differences in the evaluation cultures on each side of the Ocean?
In the U.S. it certainly seems to be more advanced in the sense that the idea to do something is, from the start, connected to the idea of wanting to know whether the thing you do is a good thing or not. The willingness or readiness to learn about the effectiveness of policy interventions is certainly more advanced than it is in Europe and also in Germany. This results in the idea of doing randomised experiments to evaluate policies, to use the most advanced methods to learn something about programmes; etc. Moreover, in the U.S. there are more flows between academia and accountability institutions. In Germany we do have a big accountability institution, the Bundesrechnungshof, which is a huge institution whose purpose is to look at what the government does and whether it does it in an efficient way or whether it is wasting money, but they do not seem to be evaluating anything academically. There are thousands of people there but I do not know exactly how they do it. Every now and then you see in the press that they say “this was money wasted”. I would say there are many instances in which they point to the right problems, but it does not seem to be connected with extensive scientific work.
In a recent meta-analysis you state that ALMP evaluations are undergoing major advances in methodology and data quality. Could you please summarise the state of the art in ALMP evaluation methods?
The most robust method is a randomised controlled trial; perhaps starting with a pilot before full roll-out of the programme. In the absence of that you need good administrative data. In our meta-analysis we found that researchers mostly use matching methods, so you basically try to mimic the experiment using a lot of observed information from typically large data sets, and try to construct a comparison group that you can legitimately compare to your treatment group. Another way to go is duration models, which only require unemployment register data. The good thing is that you do not need employment records since you look at the time spent until people exit from unemployment, and do not really look at where people go to. So those are the typical methods that are used. Of course, every now and then people come up with a given situation in which a policy change allows for a natural experiment or a regression discontinuity design, since some groups are created that are really nicely comparable, even though they are not generated by a controlled trial. For instance, the British New Deal for Young People targets people who are 18 to 24 years of age. So basically you design the evaluation by comparing those in the program that are just under 25 with those just over 25. The latter are not eligible to enter the program but are otherwise very similar. That is certainly a nice comparison group generated by the programme eligibility rule.
Do you think that implementation is given proper attention in evaluation studies and that, in general, qualitative techniques are given a proper use in evaluation?
Often not to the extent that would be desirable. I do quantitative evaluation myself and I of course would be “pro-quantitative” and say that that is the important part, but it often stops short of looking at what is really happening with the programme. Having some qualitative elements always helps to understand how a programme works, and even if you quantitatively measure a precise effect it still does not tell you how the programme managed to bring about that effect; which elements of the programme really worked and how to interpret that. Imagine you have a randomised control design between two groups: one gets training and you have a control group, and you can measure the outcomes and say “OK, there’s a 10 percentage point increase in the employment rate of the participants”; then that is valuable, crucial information, no doubt, but you may still want to know what exactly led to those results. If you want to dig a little deeper, you need some further information.
In the specific case of ALMPs, do you think data is still a barrier to the development of evaluations? Do you feel that evaluations in Germany are limited by the lack of data, which is the case here in Spain, or has that improved?
It has improved a lot in the sense that, especially in the labour market context, the Federal Employment Agency collects a lot of data, and right now it is typically available to any researcher. Ten or fifteen years ago they also collected a lot of data but it was simply not public. There were some evaluations back then, but that was mainly because there was a limited set of researchers who were allowed in cooperation with the agency, that is, who were given access to the data. Nowadays the Federal Employment Agency provides many useful, comprehensive data sets, and access is good.
Evaluation methods have attained a level of sophistication which is hardly understandable for policymakers and practitioners, let alone the general public. According to your experience, what is the best way to disseminate evaluation results to promote change?
I do not think it is that difficult to bring the results across, because the basic ideas are quite simple. It seems to me that it must be the general idea of policymakers, the idea of having something evaluated, which must seem scary; but I am not really sure that is because the methods are particularly complicated. I think the basic idea that we try to explain is “OK, I have 1000 people in a training programme and 500 find a job.” What does that tell me? It tells you nothing, because you do not know anything about the counterfactual. I think those are basic ideas that can be communicated, that can be understood by people who are not professionals in the field. You do not need any econometrics to understand the main idea of how to come to causal inference; that you need a control group, because if you only have the treatment group you do not know what would have happened to them in the absence of the programme, so you need something that is as best as possible comparable. Randomised controlled trials are the most intuitive; and once a person has understood a randomised controlled trial it is easy to understand matching, and then you can also explain other natural experiment techniques: 24 years versus 25 years, those born before January, those born after January, etc. They are all different mechanisms that are trying to reach the idea of what we can say about what would have happened without the programme. And I do not think that is something that is too difficult to explain or understand.
Finally, we wanted to ask what is next on the ALMP evaluation agenda. What should evaluation focus on?
I think the most important thing is to always systematically connect interventions with the evaluation efforts; that is the main priority. To try to involve researchers in the process as early on as possible, so that you can ideally, while still designing the policy, already be designing the evaluation that goes along with it, and then it will naturally come to the point when you can say “OK, a pilot might make more sense in this specific case”, for instance. That does not mean that if you are not involved from the beginning, evaluation does not make sense… I do not want to be the academic who says, “Well no, if it’s not a randomised experiment, I’m not interested”; because I believe there is a pragmatic solution for every program evaluation task. So if the two sides talk to each other and the academics get involved, I’m convinced there can always be a way to identify an evaluation method to answer the question “What is the effect of program X?”