Consent to admin data linkage and bias

Consent to admin data linkage and bias

Surveys are increasingly being linked to administrative records with the potential of greatly enriching survey content on subjects such as health and education at a limited extra cost. One major challenge to data linkage is when respondents refuse permission to link their administrative records to a survey. This problem leads to a reduction in sample size and to sample bias if consent is correlated with respondents’ characteristics.

At the Centre for Longitudinal Studies we are carrying out an ESRC-funded project to analyse consent to admin data linkage in the Millennium Cohort Study (MCS). MCS follows 19,000 children born in the UK in 2000-01 and, so far, data has been collected when the cohort members were nine months, three, five, and seven years old. One of the important features of MCS is that the same consent questions are sought from the main respondent (i.e. the child’s parent, in most cases the mother) and the partner, and some of the questions are asked at each wave of data collection. The aim of this project is to assess the correlates of consent and to ascertain whether a latent propensity to consent exists across domains, at the household level and over time.

When we analysed multiple consent questions from MCS wave 4 for the main respondent, we found that ethnicity, gender, age, the respondent being a reserved person (as reported by the respondent), and the number of siblings in the household had a significant effect on the likelihood to consent. We also found a highly significant latent propensity to consent between all consent questions irrespective of the domain since all questions were answered by the main respondent.

At the household level four we analysed consent outcomes for two domains – health and economic records – two for the main respondent and two for the partner. The variables with the highest and most significant effects were: religion, the respondent being reserved, respondent’s age and whether the respondent received benefits. The analyses also showed that even if different respondents in the same household answered these questions, a strong and significant latent propensity to consent did exist; the strongest being for the combinations of questions answered by the same respondent.

In general, we found that respondents from ethnic minorities, older respondents and those who are more reserved were less likely to consent to data linkage. In contrast, non-religious respondents, white respondents, those with larger families, and those receiving benefits were more likely to consent. However, the impact of these variables varied according to the subject of consent and who was asked for consent.

When consent was analysed over time for the same respondent and the same domain (i.e. health), only ethnicity and whether the respondent is a reserved person had a significant effect. Further, the results showed that a weak latent propensity to consent exists between consent outcomes over time. This particular finding indicates that the likelihood to consent depends on two major groups of factors, those that are fixed over time (e.g. ethnicity) and those that reflect the circumstances surrounding the interview. The first group is likely to influence consent over time because it is intrinsic to the respondent. In contrast the second group of variables represent extrinsic circumstances which are time specific. Further, the weak latent propensity to consent over time is probably an indication that people forget what they did on the last survey and hence only their intrinsic characteristics affect their choice.

Beyond MCS, the findings of this study suggest that if consenters were very different in terms of their characteristics from the entire sample of respondents and if non-consent is high, then the significant impact of some of the correlates (e.g. ethnicity, openness, age,) requires adjustment for consent bias. Weights and multiple imputations can be used for this purpose. However, the efficiency of these techniques depends on the ability of the researcher to identify the variables which correlate with consent.

This article was first published in the December issue of Research Matters.