Book

Kirkpatrick S
Reproducibility in
Dietary Assessment

3rd Edition    August, 2025


Abstract

A dietary assessment method is considered reproducible if it gives very similar results when used repeatedly in the same situation. The reproducibility, or precision, of any dietary method is a function of the measurement errors (discussed in Chapter 5), uncertainty resulting from true variation in daily nutrient intakes  (Section 3.1.2), and variability introduced by a variety of other confounding factors (e.g., age and sex, day of the week, season, chronic illness, body weight status). If measurement errors and confounding factors are minimized, uncertainty in the estimation of usual nutrient intakes remains. Consequently, although the results from two separate occasions may disagree, the method may not have poor reproducibility; rather, food intakes may have changed. Therefore, only an estimate of reproducibility can be made.

For a group of individuals, true variability arises because dietary intakes differ among individuals (between- or inter-person variation) and within one individual over time (within- or intra-person variation). For instance, variation associated with factors such as day of week and season may be shaped by cultural and environmental factors. Unlike measurement errors, no attempt should be made to try to minimize between- and within-person variation because they characterize the true usual intake of a group of individuals. Instead, the dietary assessment protocol should allow these two sources of variability to be separated and estimated statistically. In this way, the magnitude of the effect of within-person versus between-person variation can be considered during the interpretation of the dietary data. This is important because this variation biases estimates of the prevalence of inadequate or excessive intakes in a population (Section 3.3.2) and can distort correlations and other measures of association, thus obscuring diet-disease relationships (Section 5.3).

Though related to repeatability under the umbrella of reliability, reproducibility is distinct from repeatability because of non-negligible changes in dietary intake over time. Repeatability of dietary assessment methods cannot be determined because replicate observations in dietary assessment are impossible. Reproducibility is also different from dependability, which assesses the extent to which measured differences reflect actual differences in intake.

Reproducibility should be considered along with other properties, including validity (Chapter 7), to assess whether a method is suited to the research objective and context. For example, a method may have high reproducibility but lack validity for assessing the dietary components of interest in the target population.

This chapter describes the assessment of reproducibility in the most used dietary assessment methods. Some notes on the statistical techniques used to assess reproducibility are also included, though researchers are encouraged to consult with statisticians to guide the optimal application and interpretation of such techniques. CITE AS:     Kirkpatrick S,     Reproducibility in Dietary Assessment https://nutritionalassessment.org/reproducibility/
Email: skirkpat@connect.uwaterloo.ca
Licensed under CC-BY-4.0
( PDF )

6.1 Introduction

Conventionally, reproducibility has been determined using a "test-retest" design, in which the same dietary method is repeated on the same individuals over the same period, after a preselected time interval. The selection of the time interval depends on the time frame of the dietary method used. For instance, for food frequency questionnaires aimed at capturing intake over the past year, variation between repeat administrations separated by a few months likely reflect variation associated with completion of the questionnaire, whereas variation over longer intervals likely reflect true changes in intake (Willett et al. 1985). Care must be taken to avoid the second measurement being influenced by the earlier one through short intervals, such as a few days, between re-administrations. The effects of season or changes in dietary practices over time must also be avoided. In low-income countries, the effects of season on food availability, and thus food and nutrient intakes, may be marked.

Assessment of reproducibility using a test-retest design has other limitations. Reproducibility is a function of both the uncertainty resulting from true variation in daily nutrient intakes within individuals and random errors in measurement. Normally, these sources of uncertainty cannot be distinguished. However, random measurement errors can be reduced by incorporating various quality control procedures into the dietary assessment method (Section 5.4). For instance, Murtas et al. 2018. found that aiding children and adolescents in the completion of an initial web-based 24h recall improved reproducibility.

There will always be a tendency for some dietary assessment methods to have higher reproducibility than others because some designs limit the recording of variability in food and, hence, nutrient intakes. An example is a food frequency questionnaire in which estimates of portion size are based on a single set of "standard reference" portions  (Hankin et al. 1978). Moreover, because a food frequency questionnaire (and a dietary history) is designed to assess the usual food intake of an individual over a relatively long period, they are not sensitive to day-to-day variations in intake. Reproducibility may also be high, even if some individuals consistently under- or over-estimate the portion sizes consumed. Thus, even if the dietary assessment method appears to have high reproducibility using a test-retest design, it does not necessarily produce accurate data. Conversely, a lack of agreement between two sets of food and nutrient intake results may not reflect poor reproducibility in the method; intake may have changed in the interval between the two measurements because of usual daily variation in consumption. Perfect reproducibility at the individual level is unlikely. Further, reproducibility can vary based on the individual whose intake is measured, as well as the individual administering the dietary assessment method (Frongillo et al. 2019), such as in the case of interviewer-administered recalls.

In general, the reproducibility of a dietary assessment method depends on the time frame of the method, the population group under study, the nutrient or other dietary component (e.g., alcohol) of interest, the method used to measure the foods and quantities consumed, and the between- and within-person variances.

6.1.1 Twenty-four-hour recalls

Relatively fewer studies have been published on the reproducibility of 24h recalls compared to other methods, particularly frequency questionnaires. This is logical given the recognition of large within-person day-to-day variation in intake (undependability  Frongillo et al., 2019) that results in variation across repeat recalls. Therefore, the test-retest reliability of recalls at the individual level is low. Methods to estimate distributions of usual intake among populations leverage repeat recalls to partition and adjust for the within-person variation (Section 3.3.2). Nonetheless, 24h recalls can provide relatively reproducible estimates of the mean usual intakes of a group, particularly if the sample size is large and days in all parts of the week are represented. The number of individuals required to estimate the average usual intake of a group with a specified degree of precision can be calculated; an example is given in Section 3.3.1.

Representing all days of the week is important because food consumption can vary across days, for example, based on cultural and environmental factors (Gibson et al. 2017). Failure to proportionately cover both weekdays and weekend days will bias estimation of within-person variation and estimates of usual intake (Tarasuk and Beaton 1992). When repeated 24h recalls are used, nonconsecutive days are preferred, because eating behaviors on consecutive days are correlated (Hartman et al. 1990). As a result, within-person variation is underestimated and estimates based on consecutive days may appear to be more reproducible than those based on non-consecutive days (Tarasuk and Beaton 1992).

When the within-person and between-person variances are equal, the corresponding variance ratio is 1.0. The higher the ratio of the within-person variation to the between-person variation, the greater the number of days needed to estimate intake of the dietary component with some degree of precision  (De Castro et al. 2014). Basiotis et al. (1987) calculated that only four 1d records were required to estimate average protein intake over 1y within ±10% of true usual intake 95% of the time for a group of 16 adult women, compared to 44 1d records for average intake of vitamin A for the group over 1y for the same degree of precision. Comparing variance ratios across studies is challenging due to their dependence on the nutrient, sample size, population, and dietary methodology used.

Single 24h recalls have sometimes been used to make inferences about the usual intake of individuals, presumably on the assumption that intake over one 24h period adequately represents the habitual intake. This assumption is not correct. Any estimate of an individual's usual intake, based on a single 24h recall, has low reproducibility because of relatively large within-person variation in food intake. Nevertheless, single 24h recalls can be used to assess actual intakes of food and nutrients, sometimes required for metabolic studies or counseling purposes.

The reproducibility of the measurement of the usual intake of an individual can be improved by obtaining several 24h recalls or records for the same individual, preferably on nonconsecutive days. Equations have been developed for calculating the number of replicate days required to characterize the average intake of an individual with a desired level of precision (Liu et al. 1978; Beaton et al. 1979; Black et al. 1983; Marr and Heady 1986; Basiotis et al. 1987; Nelson et al. 1989) as noted in Section 3.3.3. Obtaining an estimate within 10% of the true usual intake requires more days of intake data per individual than when an estimate within ±20% is required (Basiotis et al. 1987). For example, using dietary data from the Food Habits of Canadians Survey, Palaniappan et al. 2003 calculated that thirty 24h recalls per person were required to obtain an estimate of the energy intake of an individual within ±10% of the true usual intake 95% of the time, compared to eight and three 24h recalls per person when estimates within ±20% and ±30% respectively are required.

The number of days required also depends on the variability of the dietary component of interest, the study group, and the dietary methodology used (Beaton et al. 1979; Basiotis et al. 1987; Nelson et al. 1989; Institute of Medicine 2000). Among pregnant women in Indonesia, for example, only six 24h recalls were needed to estimate intake of energy, carbohydrate, vitamin A, iron, and vitamin C for an individual within ±20% of the true usual intake, whereas for calcium, 24 replicates would be required (Persson et al. 2001). In the studies of Palaniappan et al. 2003 and Persson et al. 2001 the equation developed by Beaton et al. (1979) was used. In the Indonesian study, the between-individual variation for energy and nutrients was greater relative to the within- person variation, perhaps because of the limited number of foods consumed. In higher-income countries, the within-person variation for nutrients is generally greater than the between-person variation (Palaniappan et al. 2003), though this may not always be the case in children, discussed further in (Section  6.1.2)

Several studies have been conducted to examine within- and between-person variation among children. For instance, based on recalls collected in the U.S. National Health and Nutrition Examination Survey, Ollberding et al. 2014 found that within-person variation was higher than between-person variation for all nutrients examined, though the variance ratio was higher for those aged 6 to 11y compared to those aged 12 to 17y. Based on these findings, a higher number of recalls (Table 6.1) is needed for younger children versus adolescents to rank-order intakes with a specified level of accuracy, indicated in this study by the correlation between observed intake and estimated usual intake. To inform the design of randomized controlled trials, St. George et al. 2016 examined the reliability of recalls for assessing intakes of energy, fat, fruit, and vegetables among 456 African American children and adolescents who completed up to three 24h recalls (Figure 6.1), whereas Padilha et al.  2017 have examined the number of recalls needed to estimate intake of a range of nutrients among Brazilian children aged 13 to 32 mos. Studies using food records have also examined within- and between-person variation among both children and adults  (Section  6.1.2).

Table 6.1 Within- to between-person variance ratio, number of days required for specified correlation between observed and usual intake, and correlation between three 24h diet recalls and usual intake for selected nutrients according to sex and age group among children and adolescents aged 6 to 17y participating in 2007‑2010 NHANES
VR : Within to between individual variance ratio.
D: Number of days required to ensure specified r value for correlationbetween observed and usual nutrient intake.
r: Correlation between observed and usual nutrient intake expected for three 24h dietary recalls.
Data from Ollberding et al. (2014)
Boys 6-11y  (n=940) Boys 12-17y  (n=830) Girls 6-11y  (n=928) Girls 12-17y  (n=775)
Nutrients VR D r VR D r VR D r VR D r
Energy, kcal 2.8 5 0.72 1.8 3 0.79 2.8 5 0.72 1.7 3 0.80
Protein, g 3.8 7 0.67 2.6 5 0.73 4.1 7 0.65 3.0 5 0.71
Carbohydrate, g 2.3 4 0.75 1.8 3 0.79 2.6 5 0.73 1.7 3 0.80
Total sugars, g 2.2 4 0.76 1.8 3 0.79 2.7 5 0.72 2.2 4 0.76
Total fat g 4.1 7 0.65 2.3 4 0.75 3.8 7 0.66 2.5 4 0.74
Saturated fat, g 4.0 7 0.65 2.0 4 0.77 4.3 8 0.64 2.6 5 0.73
Monounsaturated fat, g 4.3 8 0.64 2.7 5 0.73 4.4 8 0.64 2.6 5 0.73
Polyunsaturated fat, g 4.5 8 0.63 3.5 6 0.68 3.6 6 0.67 3.5 6 0.68
Cholesterol, mg 4.2 7 0.64 3.1 5 0.70 8.1 14 0.52 3.7 7 0.67
Fiber, g 2.4 4 0.74 2.3 4 0.75 3.6 6 0.68 1.8 3 0.79
vitamin  A, mg 3.4 6 0.68 1.9 3 0.78 3.6 6 0.67 2.7 5 0.73
vitamin  E, mg 3.6 6 0.68 3.4 6 0.68 4.1 7 0.65 3.4 6 0.69
vitamin  C, mg 3.4 6 0.68 2.5 4 0.74 3.3 6 0.69 2.8 5 0.72
vitamin  D, mg 2.2 4 0.76 1.8 3 0.79 2.2 4 0.76 2.0 4 0.77
Thiamin, mg 4.7 8 0.62 1.8 3 0.79 4.0 7 0.66 2.3 4 0.75
Riboflavin, mg 2.7 5 0.73 1.7 3 0.80 2.6 5 0.73 1.9 3 0.78
Niacin, mg 6.7 12 0.56 2.7 5 0.73 4.3 8 0.64 3.8 7 0.66
vitamin B-6, mg 4.8 9 0.62 2.6 5 0.73 3.9 7 0.66 2.8 5 0.72
vitamin B-12, mg 3.5 6 0.68 2.3 4 0.76 2.9 5 0.71 3.0 5 0.71
Folate, mg 4.2 8 0.64 2.0 4 0.77 4.1 7 0.65 2.1 4 0.77
Calcium, mg 3.3 6 0.69 1.8 3 0.79 3.0 5 0.71 2.1 4 0.77
Iron, mg 5.5 10 0.59 1.7 3 0.80 4.3 8 0.64 2.0 4 0.78
Zinc, mg 4.5 8 0.63 3.0 5 0.71 4.1 7 0.65 3.7 7 0.67
Potassium, mg 2.6 5 0.73 2.2 4 0.76 2.2 4 0.76 1.9 3 0.78
Sodium, mg 2.9 5 0.71 2.2 4 0.76 3.0 5 0.70 2.2 4 0

Figure 6.1
Figure 6.1 Reliability estimates for dietary outcomes based on number of observations (i.e., dietary recalls) in African American youth who completed registered dietitian nutritionist-administered recalls. Modified from St. George et al. 2016.

Whether the foods (and corresponding nutrients) of interest are staples and/or are subject to seasonal variation is also relevant to the reproducibility of estimates from 24h recalls. Although the food supply and dietary intakes may be relatively stable across seasons in some countries (discussed further in Section  6.2.5), this is not universally the case. In a sample of adolescents from South Africa, Rankin et al. 2012 found higher intraclass correlation coefficients (reproducibility coefficients) (Table 6.2) for nutrients and food groups based on four and five recalls compared to two and three. Recalls were completed in March (autumn), May, June (winter), August, and September (spring). Relatively high coefficients were observed for staple foods, including maize meal and bread, whereas lower coefficients were observed for fruit and vegetables, availability of which varied across seasons (Rankin et al. 2012). Caswell et al. 2020 found that season accounted for 3‑20% of variance in nutrient intakes among children aged 4‑8y living in rural villages and peri-urban towns in Zambia who completed up to seven 24h recalls at monthly intervals, covering the late post-harvest, early lean, and late lean seasons (Table 6.3). Accordingly, the authors (Caswell et al. 2020) emphasized the need for replicate recalls across seasons to estimate long-term usual intakes in settings in which availability of nutrient-rich foods varies seasonally. Gibson et al. 2017 note that feast days should be avoided as corresponding dietary practices may diverge from usual eating.

Table 6.2 The mean reproducibility coefficients (RC) and 95% confidence intervals (-95%CI and +95%CI) of different nutrients and food groups for different numbers of 24h recalls (n = 87). Data from Rankin et al. (2012).
Two 24h recallsThree 24h recalls Four 24h recalls Five 24h recalls
DescriptionsRC-95%CI+95%CIRC-95%CI+95%CIRC-95%CI+95%CIRC-95%CI+95%CI
Energy 0.17 -0.21 0.46 0.28 -0.03 0.49 0.54 0.33 0.67 0.54 0.35 0.67
Nutrients
Carbohydrate 0.15 -0.24 0.45 0.26 -0.05 0.48 0.60 0.40 0.71 0.56 0.37 0.68
Protein 0.04 -0.37 0.37 0.35 0.06 0.55 0.55 0.34 0.68 0.59 0.40 0.70
Fat 0.08 -0.33 0.40 0.13 -0.22 0.39 0.43 0.19 0.59 0.49 0.28 0.63
Calcium 0.38 0.04 0.60 0.28 -0.03 0.49 0.56 0.36 0.69 0.53 0.34 0.66
Iron -0.03 -0.46 0.33 0.36 0.08 0.56 0.53 0.31 0.66 0.53 0.33 0.66
Zinc 0.18 -0.20 0.46 0.33 0.04 0.53 0.47 0.24 0.62 0.55 0.36 0.68
Vitamin B12 -0.22 -0.70 0.20 0.22 -0.11 0.45 0.40 0.15 0.57 0.35 0.10 0.53
Vitamin B6 0.17 -0.20 0.45 0.44 0.17 0.61 0.59 0.39 0.71 0.58 0.39 0.69
Vitamin C 0.22 -0.15 0.49 0.35 0.07 0.55 0.43 0.19 0.59 0.44 0.21 0.59
Vitamin A 0.42 0.09 0.62 0.32 0.02 0.52 0.52 0.30 0.66 0.52 0.32 0.65
Folate 0.04 -0.37 0.37 0.32 0.02 0.52 0.58 0.37 0.70 0.54 0.34 0.66
Riboflavin -0.01 -0.44 0.34 0.30 0.00 0.51 0.25 -0.05 0.46 0.31 0.04 0.50
Thiamine 0.27 -0.09 0.52 0.24 -0.08 0.47 0.45 0.22 0.60 0.46 0.24 0.61
Niacin 0.43 0.11 0.62 0.49 0.23 0.64 0.55 0.35 0.68 0.60 0.41 0.71
Food groups
Maize meal 0.49 0.17 0.67 0.47 0.21 0.63 0.59 0.39 0.71 0.65 0.48 0.75
Bread group 0.26 -0.10 0.52 0.31 0.01 0.52 0.57 0.36 0.69 0.61 0.44 0.72
Cereal group 0.01 -0.40 0.35 0.30 0.00 0.51 0.51 0.29 0.65 0.55 0.35 0.67
Milk group 0.55 0.24 0.70 0.50 0.25 0.65 0.51 0.29 0.65 0.56 0.37 0.68
Meat group 0.13 -0.27 0.43 0.14 -0.21 0.40 0.38 0.12 0.55 0.45 0.23 0.60
Fruit & veg, 0.06 -0.35 0.39 0.26 -0.05 0.48 0.49 0.27 0.63 0.44 0.21 0.59
Sweets 0.06 -0.36 0.39 0.23 -0.09 0.46 0.50 0.27 0.64 0.40 0.16 0.5

Table 6.3. Within-person, between-person and seasonal components of variance as percentage of total variance and within- to between-person variance ratios (Ratio)for energy and nutrient intakes, among 4y to 8y-old participants in the non-intervened arm of a biofortified maize efficacy trial in Mkushi, Zambia, 2012-2013. Data from Caswell et al., 2020.
Unadjusted modelsAdjusted models
Nutrients Within-
person (%)
Between-
person (%)
Seasonal
(%)
Ratio Within-
person (%)
Between-
person (%)
Seasonal
(%)
Ratio
Energy (kcal/d) 80·7 11·0 8·3 7·3 71·3 12·1 6·2 5·9
Protein  (g/d) 87·2 8·6 4·2 10·1 80·9 8·4 5·5 9·7
Fat (g/d) 81·9 15·2 2·9 5·4 76·4 16·1 2·9 4·8
Carbohydrates  (g/d) 81·2 6·1 12·7 13·3 70·9 7·5 8·0 9·4
Ca (mg/d) 87·6 3·4 9·0 25·5 83·2 3·0 11·6 27·5
Fe (mg/d) 91·1 4·5 4·3 20·1 85·0 4·1 4·6 20·8
Zn (mg/d) 84·3 10·2 5·5 8·3 77·7 10·7 3·7 7·3
Vitamin A (µg RAE/d) 92·5 6·4 1·1 14·4 88·5 4·9 3·5 17·9
Thiamin (mg/d) 75·0 9·4 15·6 8·0 65·1 8·4 17·5 7·7
Riboflavin (mg/d) 79·5 0·3 20·2 232·8 69·1 1·1 23·0 65·5
Niacin (mg/d) 86·7 3·3 10·0 26·3 78·6 2·6 14·3 30·7
Vitamin B6 (mg/d) 73·3 12·7 14·0 5·8 63·3 9·0 17·3 7·0
Folate (µg/d) 84·6 5·0 10·3 16·8 80·6 6·1 9·1 13·2
Vitamin B12 (µg/d) 89·1 8·1 2·9 11·0 87·0 7·9 4·1 11·0
Vitamin C (mg/d) 88·8 0·0 11·2 84·8 0·0 9·4

Further research using repeated 24h recalls to partition within- from between-person variation in diverse population groups is essential for informing calculations of the number of days needed to characterize average usual intake at the individual level with a specified level of precision. In using the results of such research to inform study design, it is critical that researchers carefully consider their objectives and the corresponding data requirements and considerations, as outlined in Section 3.3.

6.1.2 Food records

To minimize errors resulting from memory lapses and inadequate estimation of portion size, a weighed food record is sometimes used. A 7-d weighed record has often been considered appropriate for estimating the average usual nutrient intakes of individuals. However, the respondent burden is high, and problems with compliance may arise. Consequently, shorter periods, ranging from 2 to 5d, are often used. For example, the U.K. National Diet and Nutrition Surveys shifted from 7d weighed to 4d estimated food diaries in 2008 (and more recently, to four 24h recalls collected using Intake24, a web-based interface) (Venables et al. 2022). Sometimes, 1d weighed food records are collected to assess actual food intakes of individuals participating in metabolic studies or for counseling purposes.

In general, studies of the reproducibility of a 7d weighed record have found good agreement between group mean values obtained for energy and most nutrients on two separate occasions, except when individuals have been on special diets. In a study of nurses living in the United States, Pearson's and intraclass correlation coefficients were used at the individual level to assess the reproducibility of two 7d weighed records  (Section  6.3.5). As shown in Table 6.4, intraclass correlation coefficients ranged from 0.41 to 0.79, the lowest being for total vitamin A without supplements (ri = 0.41) and polyunsaturated fat (ri = 0.45)  (Willett 2013).

Table 6.4. Reproducibility of 7d weighed food records. The data were obtained on two occasions 1y apart from 173 female registered nurses, aged 34 to 59, residing in the Boston area, 1980–1981. Correlation coefficients were calculated on log-transformed data to improve normality. From Willett et al. (1985).
Correlation Coefficients
Record 1 vs. 4
Pearson r Intraclass ri
Protein 0.56 0.56
Total fat 0.57 0.54
Saturated fat 0.57 0.56
Polyunsaturated fat 0.44 0.45
Cholesterol 0.54 0.53
Total carbohydrate 0.74 0.72
Sucrose 0.60 0.66
Crude fiber 0.65 0.65
Total vitamin A 0.47 0.56
without suppl. 0.34 0.41
Vitamin  B6 0.68 0.79
without suppl. 0.60 0.60
Vitamin  C 0.67 0.70
without suppl. 0.63 0.68
Total calories 0.67 0.63

Sempos et al. 1985 were among the first investigators to use analysis of variance estimates of the within- and between-person variation to assess the reproducibility of food records at the individual level. In this study, 2d food records were collected on two randomly selected days per sampling month over a 2y period. The participants were 151 middle-aged women. For all 15 nutrients examined, within-person variation was greater than between-person variation. In some low-income countries, where diets are often less varied compared to higher-income countries, high within- to between- person variance ratios have sometimes (Nyambose et al. 2002), but not always (Persson et al. 2001), been reported. For example, in a study of energy and nutrient intakes of pregnant women in rural Malawi, weighed intakes were collected on average for six consecutive days per women. Variance ratios (i.e., ratios of within- to between- person variances) were then calculated for women during the second trimester of pregnancy; they ranged from 1.1 for fat to 10 for vitamin B12. In this study, only individual intakes of energy, protein, carbohydrate, and fiber could be determined within 30% of their true usual intakes, with 10 replicate days of weighed intake (Nyambose et al. (2002). To obtain estimates within 20% of the true usual intakes of energy, protein, carbohydrates, and fiber, from 8 to 23 record days were needed. For the micronutrients, from 95 to 213 record days were required. These findings show how difficult it may be to obtain precise estimates of an individual's usual intake of micronutrients.

Several studies have used food records to examine variance in intake among children. Lanigan et al. 2004, studying 72 infants and children aged 6mos to 2y for whom parents completed a 5d weighed food record, found that between-person variation was greater than within-person variation in this age group. Similarly, in a sample of 1639 Finnish children, Erkkola et al. 2011, found that within-person variation was generally lower than between-person variation among 1y olds except for cholesterol, vitamin A, beta-carotene, and vitamin B12; however, this was not the case for most nutrients among 3y and 6y children. In a study of Flemish pre-schoolers, Huybrechts et al. 2008 also found that within-person variation was lower than between-person variation for most micronutrients among children aged 2.5‑3y, but this was not the case for most nutrients among 4‑6.5y (Table 6.5). Accordingly, the authors estimated that fewer days of intake would be sufficient for younger versus older children. However, using data from weighed and estimated food records from close to 3000 Brazilian children aged 1‑6y, de Castro et al. 2014 found that within-person variation was higher than between-person variation for all nutrients examined, ranging from 1.17 for calcium to 8.70 for fat in children aged 1‑2y and from 1.47 for calcium to 8.95 for fat among 3‑6y children. There was some variability in the ratio of within- to between- person variation by age and body weight status, but the authors concluded that seven days of intake was sufficient to rank children into tertiles of intake for most nutrients (de Castro et al. 2014). Erkkola et al. 2011 also observed that the ratio of within- to between-person variance was slightly higher in girls than in boys. These studies show the importance of considering the population and context in the design of studies using food records.

Table 6.5. Nutrient intakes calculated from estimated diet records, CVs, variance ratios and the number of days (D) required to ensure r=0.9, for children 2.5-3y and 4-6.5y separately
a: Ratio of within- to between-person variance
b: Number of days required
c: Log e-transformed data used
N (children 2.5–3y) = 197
N (children 4–6.5y) = 464
Data from Huybrechts et al. 2008.
Nutrient Age Mean (SEM) CVw(%) CVb(%) Ratio Days
Energy (kJ/day) <4y 5915.7 (103.1) 19.1 18.3 1.1 5
>4y 6206.3 (60.6) 17.8 15.1 1.4 6
Protein (g/day) <4y 54.9 (1.1) 23.5 21.7 1.2 5
>4y 56.7 (0.7) 22.5 18.1 1.6 7
Total fat (g/day) <4y 49.2 (1.1) 31.4 20.4 2.4 10
>4y 51.5 (0.7) 32.3 18.8 2.9 13
Saturated fatty
acids (g/day)
<4y 21.4 (0.5) 31.3 23.6 1.8 7
>4y 22.5 (0.3) 33.2 18.6 3.2 14
Monounsaturated
fatty acids (g/day)
<4y 18.0 (0.5) 37.7 20.5 3.4 14
>4y 18.8 (0.3) 37.2 20.4 3.3 14
Monounsaturated
fatty acids (g/day)
<4y 2.8 (0.02) 13.0 7.3 3.1 13
>4y 2.8 (0.02) 12.9 7.1 3.3 14
Polyunsaturated
fatty acids (g/day)
<4y 7.4 (0.2) 41.6 30.8 1.8 8
>4y 7.7 (0.2) 44.9 29.5 2.3 10
Polyunsaturated
fatty acids (g/day)
<4y 1.9 (0.03) 21.9 17.1 1.6 7
>4y 1.9 (0.02) 22.2 16.1 1.9 8
Cholesterol (mg/day) <4y 160.8 (4.1) 49.8 16.6 9.0 38
>4y 167.3 (3.0) 48.2 17.6 7.5 32
Cholesterol (mg/day) <4y 5.0 (0.02) 9.2 4.1 5.0 21
>4y 5.0 (0.02) 8.6 4.0 4.6 19
Carbohydrates (g/day) <4y 186.5 (3.7) 19.8 21.7 0.8 4
>4y 197.4 (2.5) 17.8 17.5 1.0 4
Simple carbo-
hydrates (g/day)
<4y 105.1 (2.8) 24.1 29.6 0.7 3
>4y 110.6 (1.8) 24.1 23.8 1.0 4
Complex carbo'
hydrates (g/day)
<4y 80.9 (1.7) 25.8 19.9 1.7 7
>4y 86.1 (1.2) 23.6 21.1 1.3 5
Fibre (g/day) <4y 12.8 (0.3) 26.3 22.0 1.4 6
>4y 13.6 (0.2) 26.9 23.2 1.3 6
Water (g/day) <4y 1244.3 (24.1) 17.3 22.7 0.6 2
>4y 1276.5 (16.1) 17.1 20.9 0.7 3

6.1.3 Dietary histories

The reproducibility of a dietary history when used to assess usual mean intakes at the group level depends on the time frame used, its method of administration (e.g., face-to-face or telephone interviews), the time lag of the method, the technique of measuring amounts of foods consumed, and the population group.

The dietary history was shown to yield good reproducibility when used to obtain group mean intake information, especially over a relatively short time frame. For example, van Staveren et al. 1985 concluded that their dietary history method, covering 1mo, provided reproducible estimates of the mean energy and nutrient intakes for an average weekday (Table 6.6). On an individual level, high intraclass correlations also suggested good overall agreement between the two dietary histories. For weekend days, reproducibility was poorer, especially for saturated fat, carbohydrate, and linoleic acid, because of greater dietary variability on weekends.

Table 6.6. Reproducibility of a dietary history based on interviews with 47 Dutch adults. Data were collected 1mo apart. From van Staveren et al. 1985.
1st interview
(mean)
2nd interview
(mean)
SD of
differences
Intraclass
correlation ri
Energy (kcal) 2352 2327 434 0.86
Protein (g) 82 79 14 0.80
Fat (g) 101 103 26 0.81
Saturated fat (g) 45 46 9 0.89
Linoleic acid (g) 12 14 7 0.67
Carbohydrate (g) 256 248 46 0.87
Dietary fiber (g) 24 23 6 0.75
Alcohol (g) 12 12 8 0.91

In a case-control study on breast cancer in a group of Caucasian and Japanese Hawaiian women, a dietary history questionnaire covering a typical week was repeated after 3mo (Hankin et al. 1983). The amounts of food consumed were estimated using photographs of three serving sizes (small, medium, and large) of each food. Mean intakes of total fat, saturated fat, cholesterol, and animal protein for all participants on the two occasions were not significantly different, as tested by the paired t-test. Furthermore, the extent of the variability in the nutrient intake, as measured by the standard deviation, was also similar in both interviews. Hence, the questionnaire yielded a reproducible estimate of the usual mean intake of total and saturated fat, cholesterol, and animal protein for the entire group. Nevertheless, when the mean intakes for the two interviews were examined by ethnicity, significant differences were found. These investigators suggested that a longer period was required to estimate the usual food intake of White individuals because of greater variability in their usual diets compared with Japanese Hawaiian women.

The use of only three possible portion sizes in the dietary history questionnaire of Hankin et al. 1983 may have contributed in part to low variability and thus relatively high reproducibility (Block and Hartman 1989). Intraclass correlation coefficients obtained after repeated administrations (n = 3) of a dietary history questionnaire based on variable portion sizes compared to one without variable portion sizes, on the same study group, were lower (Pietinen et al. 1988a; 1988b).

Dutch investigators evaluated the reproducibility of a cross-check dietary history method covering 1y by repeating the survey four times over a 4y period (van Beresteyn et al. 1987). They noted that mean daily intakes of energy and most nutrients (total protein, vegetable and animal protein, total fat, saturated fat, mono- and polyunsaturated fat, carbohydrate, dietary fiber, calcium, phosphorus, cholesterol, iron, and sodium) for the group of 246 women were very similar over each of the 4y, with correlation coefficients ranging from 0.70 to 0.84. Vitamin A and vitamin C were exceptions, with lower correlation coefficients (0.63 and 0.67, respectively), attributed to variation in intakes of foods containing relatively high concentrations of these vitamins.

Slightly lower correlation coefficients (0.54 to 0.75 depending on the nutrient) were reported by a French group using a self-administered dietary history questionnaire with 238 food items structured according to the French meal pattern (Van Liere et al. 1987). This dietary history was administered twice to 110 participants, with an interval of approximately 1y. The reproducibility of a Spanish dietary history method-the DH-ENCRICA-was examined by Guallar-Castillón et al. (2014). The DH-ENRICA is an electronic version based on the DH-EPIC, i.e., dietary history developed by the European Prospective Investigation into Cancer and Nutrition (EPIC). The method consisted of a computerized interviewer-administered questionnaire in which participants are requested to indicate all foods usually consumed in the past year along with their details, followed by an instrument that collects information on 861 foods using photos for three portion sizes. The dietary history was administered twice, one year apart, to 101 participants recruited by physicians. The mean unadjusted intraclass correlation for all nutrients was 0.45, ranging from 0.14 for iron to 0.62 for ethanol The correlation for energy was 0.66. For food groups, the mean intraclass correlation was also 0.45, ranging from 0.08 for tubers to 0.60 for alcoholic beverages. Table 6.7.
Table 6.7. Reproducibility of DH-E. Correlation coefficients between DH-E1 (at study baseline) and DH- E2 (at 12mos from baseline). Data from Guallar-Castillón et al. (2014).
Pearson Correlation coefficient Intraclass Correlation coefficient
Unadjusted Energy-adjusted Unadjusted Energy-adjusted
Food groups
Cereals 0.61 0.56 0.59 0.54
Milk 0.60 0.59 0.58 0.59
Meat 0.61 0.59 0.58 0.55
Eggs 0.42 0.41 0.34 0.35
Fish 0.40 0.41 0.38 0.39
Oils and fats 0.30 0.31 0.26 0.30
Vegetables 0.59 0.59 0.55 0.55
Legumes 0.42 0.42 0.37 0.38
Tubers 0.08 0.04 0.08 0.04
Fruits 0.51 0.49 0.40 0.40
Dried fruits and nuts 0.40 0.40 0.35 0.39
Chocolate and similar 0.41 0.40 0.42 0.41
Coffee, cocoa and infusions 0.78 0.77 0.77 0.75
Soft drinks 0.45 0.44 0.44 0.43
Alcoholic beverages 0.60 0.59 0.60 0.59
Nutrients
Energy 0.69 0.66
Total protein 0.43 0.33 0.32 0.22
Animal protein 0.63 0.59 0.61 0.57
Vegetable protein 0.50 0.45 0.44 0.43
Lipids 0.58 0.51 0.57 0.49
Saturated fatty acids 0.65 0.57 0.65 0.56
Monounsaturated fatty acids 0.52 0.46 0.50 0.44
Polyunsaturated fatty acids 0.39 0.25 0.37 0.24
Linoleic acid (C 18:2, n-6) 0.40 0.26 0.38 0.25
α-linolenic acid (C 18:3, n-3) 0.27 0.26 0.28 0.26
Eicosapentanoic acid, EPA (C 20:5, n-3) 0.40 0.40 0.40 0.40
Docosapentanoic acid, DPA (C 22:5, n-3) 0.50 0.48 0.51 0.49
Docosahexanoic acid, DHA (C 22:6, n-3) 0.40 0.40 0.40 0.40
Trans FA 0.53 0.42 0.51 0.41
Cholesterol 0.54 0.48 0.52 0.46
Total carbohydrates 0.66 0.61 0.58 0.57
Sugars 0.50 0.47 0.45 0.46
Polysaccharides 0.62 0.58 0.60 0.57
Ethanol 0.63 0.62 0.62 0.61
Fiber 0.49 0.51 0.47 0.50
Caffeine 0.40 0.41 0.40 0.41
Sodium 0.45 0.34 0.45 0.34
Potassium 0.52 0.54 0.47 0.52
Calcium 0.48 0.43 0.47 0.42
Magnesium 0.53 0.50 0.44 0.46
Phosphorus 0.53 0.45 0.50 0.43
Iron 0.28 0.19 0.14 0.10
Zinc 0.60 0.48 0.57 0.47
Selenium 0.35 0.31 0.31 0.30
Iodine 0.37 0.34 0.37 0.34
Vitamin A 0.20 0.19 0.19 0.19
Retinoids 0.45 0.35 0.44 0.35
Carotenoids 0.46 0.46 0.45 0.45
Vitamin D 0.21 0.19 0.17 0.17
Vitamin E 0.47 0.45 0.44 0.43
Thiamin 0.45 0.34 0.38 0.30
Riboflavin 0.63 0.56 0.59 0.55
Niacin 0.53 0.47 0.49 0.43
Vitamin B6 0.42 0.35 0.32 0.29
Folic acid 0.32 0.33 0.29 0.31
Vitamin B12 0.17 0.12 0.15 0.12
Vitamin C 0.52 0.51 0.51 0.50
Caffeine 0.40 0.41 0.40 0.41

6.1.4 Food frequency questionnaires

There is a large literature investigating the reproducibility of food frequency questionnaires, particularly as tailored questionnaires are often developed and/or adapted for the setting and/or population of interest. Hence, because food frequency questionnaires must include the foods and beverages available to and consumed by the given population under study, many variations have been developed. Even small changes to the design of a questionnaire may affect performance and therefore, each instrument should be evaluated independently (Willett 2013). In general, the correlations noted in the literature between repeat administrations of food frequency questionnaires range from 0.5 to 0.7 (Willett 2013), depending on the study group and its size, the nutrient of interest, and the design of the instrument.

The Willett semiquantitative food frequency questionnaire, discussed in Section 3.1.6, has evolved since 1979 and has undergone multiple evaluations of reproducibility (Willett et al. 1985; Rimm et al. 1992; Feskanich et al. 1993; Al-Shaar et al. 2021). An evaluation of reproducibility of a 61-item version with 173 female registered nurses, after a time lapse of 1y, indicated intraclass correlation coefficients for nutrients, adjusted for total energy intake, that ranged from 0.49 for total vitamin A (without supplements) to 0.71 for sucrose (Willett et al. 1985). A subsequent examination of the paper-based questionnaire, now containing over 150 items, with >600 men from the Health Professionals Follow-Up Study and the Harvard Pilgrim Health Care cohort yielded intraclass correlations ranging from 0.49 for selenium to 0.89 for alcohol (Al-Shaar et al. 2021). These correlations were not substantially changed by energy adjustment. Participants were primarily White and highly educated health professionals, with implications for generalizability.

Several investigators have examined the reproducibility of food frequency questionnaires developed for the European Prospective Investigation into Cancer and Nutrition (EPIC) study. In the Dutch (Ocké et al. 1997a) and U.K. (Bingham et al. 2001) cohorts, self- administered food frequency questionnaires with questions on the habitual consumption frequency of 178 and 130 food items, respectively, were tested. The Dutch food frequency questionnaire was administered three times — at baseline and 6 and 12mo later — to 121 men and women (Ocké et al. 1997a; 1997b), whereas the U.K. EPIC food frequency questionnaire was administered twice to 146 participants over 9mo. In both studies, Pearson's correlation coefficients on log-transformed data were used to test the reproducibility of the nutrient intakes calculated from the food frequency questionnaires. Reproducibility was moderate to high, with correlations ranging from 0.59 to 0.94 in the Dutch study, and from 0.50 to 0.80 in the U.K. study, depending on the nutrient. In these studies, energy adjustment did not consistently increase the correlation coefficients. Nagel et al. 2007 examined the long-term reproducibility of the food frequency questionnaire developed for the Heidelberg cohort of EPIC in over 21,000 participants, considering administrations of the questionnaire separated by a mean of 68.8 mo. Spearman rank correlations between the repeated measurements for nutrients ranged from 0.43 for vitamin C to 0.74 for ethanol in men (Table 6.8)

Table 6.8. Spearman rank correlations between repeated measurements (baseline (FFQ-1) and after 68.8mos mean follow-up (FFQ-2)) with the same FFQ for selected food groups and nutrients among 9,530 men in the European Prospective Investigation into Cancer and Nutrition-Heidelberg cohort. Data from Nagel et al. 2007.
Baseline
Median
Follow-up
Median
Absolute
Difference
Median
Relative
change (%)
Median
Spearman correlation
coefficient (rs)*

Crude      Energy adj.
Food groups
Potatoes (g/d)87·8 82·9 =4·3 -5·2 0·51 0·50
Vegetables (g/d)103·5 106·8 3·3 3·3 0·41 0·41
Legumes (g/d)4·0 4·3 0·0 1·5 0·45 0·45
Fruits (g/d)95·9 99·8 5·2 6·0 0·50 0·50
Dairy products (g/d)177·6 179·1 0·5 0·4 0·61 0·62
Cereals/cereal products (g/d)202·6 217·4 10·4 5·1 0·43 0·45
Meat/meat products (g/d)107·5 104·4 -1·7 2·0 0·60 0·57
Fish/shellfish (g/d)16·4 19·7 0·0 0·1 0·52 0·50
Eggs/egg products (g/d)10·0 10·3 0·1 1·6 0·53 0·51
Added fat (g/d)21·6 20·0 -1.2 6·3 0·47 0·44
Sugar/confectionery (g/d)32·5 30·5 -1·5 -6·7 0·60 0·56
Cakes (g/d)50·4 44·4 -3·1 -9·2 0·59 0·57
Non-alcoholic beverages (g/d)1324·2 1489·7 137·2 10·5 0·48 0·47
Alcoholic beverages (g/d)319·2 275·2 3·0 -1·5 0·73 0·73
Condiments/sauces (g/d)12·5 11·7 -0·7 -7·1 0·50 0·48
Soups/bouillon (g/d)35·9 34·3 -1·3 -4·2 0·52 0·51
Miscellaneous (g/d)0·1 0·2 0·0 3·3 0·48 0·48
Nutrients
Energy (kJ/d)8748·3 8672·6 -20·4 -1·0 0·53
Protein (g/d)73·8 73·8 -0·2 -0·2 0·52 0·50
Fat (g/d)77·4 74·8 -2·3 -3.2 0·53 0·43
Carbohydrates (g/d)221·8 222·9 1·7 0·8 0·51 0·52
Ethanol (g/d)18·9 17·8 -0·2 -2·4 0·74 0·74
β-Carotene (mg/d)2103·4 2145·5 50·7 2·5 0·46 0·45
Vitamin C (mg/d)87·5 92·5 4·4 5·5 0·43 0·43
Calcium (mg/d)723·0 749·2 21·3 3·2 0·54 0·57
and from 0.45 for vitamin C to 0.77 for ethanol in women (Table 6.9) (Nagel et al. 2007). Most correlation coefficients were similar or slightly attenuated after adjustment for energy.

Table 6.9 Spearman rank correlations between repeated measurements (baseline (FFQ-1) and after 5·8y mean follow-up (FFQ-2)) with the same FFQ for selected food groups and nutrients among 11,203 women in the European Prospective Investigation into Cancer and Nutrition-Heidelberg cohort. From Nagel et al. 2007.
Baseline
Median
Follow-up
Median
Absolute
Differnce
Median
Relative
change (%)
Median
Spearman correlation
coefficient (rs)*

Crude      Energy adj.
Food groups
Potatoes (g/d)69·6 67·6−1·7 −2·6 0·53 0·52
Vegetables (g/d)110·6 115·1 4·8 4·4 0·45 0·45
Legumes (g/d)2·4 2·6 0·0 2·3 0·48 0·48
Fruits (g/d)105·2 110·9 5·7 5·6 0·49 0·49
Dairy products (g/d)204·6 202·7 −3·0 −1·7 0·55 0·55
Cereals/cereal products (g/d)166·6 177·6 7·4 4·7 0·44 0·43
Meat/meat products (g/d)68·5 68·1 0·0 0·0 0·62 0·61
Fish/shellfish (g/d)15·1 15·6 0·0 0·1 0·52 0·51
Eggs/egg products (g/d)9·4 9·4 0·0 0·1 0·52 0·49
Added fat (g/d)19·6 17·4 −1·9 −10·7 0·44 0·40
Sugar/confectionery (g/d)26·9 24·9 −1·6 −8·5 0·59 0·53
Cakes (g/d)44·7 37·9−4·7 −14·2 0·59 0·54
Non-alcoholic beverages (g/d)1445·4 1657·3 195·8 14·3 0·47 0·48
Alcoholic beverages (g/d)80·1 76·1 −0·8&−1·8 0·77 0·76
Condiments/sauces (g/d)11·6 11·0&−0·8 −7·7 0·51 0·48
Soups/bouillon (g/d)26·6 26·6 0·0 0·2 0·50 0·48
Miscellaneous (g/d)0·3 0·4 0·0 6·5 0·52 0·52
Nutrients
Energy (kJ/d)6955·9 6896·1 −14·2 −0·9 0·53
Protein (g/d)59·5 60·1 0·6 1·0 0·50 0·49
Fat (g/d)64·9 61·7 −2·8 −4·5 0·53 0·41
Carbohydrates (g/d)182·6 185·6 1·5 0·9 0·51 0·49
Ethanol (g/d)5·8 5·6 −0·1&−4·6 0·77 0·77
β-Carotene (mg/d)2250·8 2336·8 82·2 4·0 0·49 0·49
Vitamin C (mg/d)93·6 99·7 5·9 6·7 0·45 0·45
Calcium (mg/d)724·9 747·1 18·4 2·7 0·50 0.53

Studies examining the reproducibility of questionnaires assessing the frequency of consumption of specific food items, rather than nutrient intakes, may generate more variable correlation coefficients (Hartman et al. 1990). This arises because there is often a large number of days when a given food or food group is not consumed. For example, in rural areas of Malawi, animal source foods are consumed only very occasionally, and then irregularly (Nyambose et al. 2002). In such settings, the within-person to between-person variance (i.e., variance ratios) for foods and food groups may be greater than for nutrients. For food groups in the Dutch study, Spearman rank order correlation coefficients ranged from 0.45 to 0.92 (Ocké et al. 1997a). In the Heidelberg study, the Spearman rank correlations between the repeated measurements for food groups ranged from 0.41 for vegetables to 0.73 for alcoholic beverages for men and 0.44 for cereals and added fat to 0.77 for alcoholic beverages for women (Nagel et al. 2007). Reproducibility of the Willett semiquantitative food frequency questionnaire for foods and food groups has also been examined in 736 women and 649 men, with an average intraclass correlation of 0.64 for foods in both sexes (Gu et al. 2023). Beverages had the highest reproducibility, whereas eggs and meat had lower reproducibility. For food groups, the average intraclass correlation was 0.71 in women and 0.72 in men, with the highest correlation for alcohol and the lowest for poultry.

Studies have also examined the reproducibility of web-based semi-quantitative frequency questionnaires. For instance, Fallaize et al. 2014 examined reproducibility of a 157‑item online frequency questionnaire among 100 participants recruited from a university population in the UK. Participants completed the questionnaire twice, 4wks apart. Unadjusted correlation coefficients for nutrients ranged for 0.65 for vitamin D to 0.90 for alcohol. For food groups, Spearman correlation coefficients ranged from 0.55 for tinned fruit or vegetables to 0.92 for alcoholic beverages (Table 6.10) (Fallaize et al. 2014). Given the sample, generalizability of the findings to the broader population may be limited.

Table 6.10 Spearman correlation coefficients (SCC) and cross-classification of quartiles of food group intake derived from repeat measures of the online Food4Me FFQ (n=100). From Fallaize et al. 2014.
Food group SCC Exact
quartile
agreement
Exact agreement
plus adjacent
quartile
Quartile
Disagree
ment
Extreme
quartile
disagreement
Rice, pasta, grains and starches .78 56 92 8 0
Savories (lasagne, pizza) .70 52 89 9 2
White bread (rolls, tortillas, crackers) .83 62 94 6 0
Wholemeal, brown breads, and rolls .77 60 91 8 1
Breakfast cereals and porridge .90 67 96 4 0
Biscuits .56 48 86 11 3
Cakes, pastries and buns .64 52 87 10 3
Milk .74 49 91 7 2
Cheeses .66 52 87 11 2
Yogurts .79 58 97 2 1
Ice cream, creams and desserts .77 55 94 4 2
Eggs and egg dishes .69 56 81 16 3
Fats and oils (eg, butter,
low-fat spreads, hard cooking fats)
.64 46 89 8 3
Potatoes and potato dishes .61 51 86 9 5
Chipped, fried & roasted potatoes .61 57 87 12 1
Peas, beans and lentils and
vegetable and pulse dishes
.75 62 92 7 1
Green vegetables .76 54 93 6 1
Carrots .68 54 90 8 2
Salad vegetables (eg, lettuce) .77 58 92 6 2
Other vegetables (eg, onions) .85 56 97 2 1
Tinned fruit or vegetables .55 86 86 14 0
Bananas .81 60 95 5 0
Other fruits (eg, apples, pears, oranges) .86 61 97 3 0
Nuts and seeds, herbs and spices .77 68 84 13 3
Fish and fish products/dishes .84 57 95 5 0
Bacon and ham .88 73 97 3 0
Red meat (eg, beef, veal, lamb, pork) .74 67 90 9 1
Poultry (chicken and turkey) .75 54 93 6 1
Meat products (eg, burgers, sausages,
pies, processed meats)
.85 62 93 7 0
Alcoholic beverages .92 71 99 1 0
Sugars, syrups, preserves, and sweeteners .78 80 94 6 0
Confectionary and savory snacks .73 52 92 8 0
Soups, sauces, and miscellaneous foods .69 59 90 8 2
Teas and coffees .85 69 96 3 1
Other beverages (eg, fruit juices,
carbonated beverages, squash)
.75 54 95 4 1

In a meta-analysis considering the reproducibility of frequency questionnaires to examine nutrient intake among healthy populations aged 8 to 86y, Cui et al. 2021 identified 123 studies published up to July 2020. Almost a third were conducted in each of the Americas and Asia, over a quarter were conducted in Europe, and approximately 5% were conducted in each of Africa and Oceania. The median sample size was 112 participants, ranging from 14 to 1981. Cui et al. 2021 found that the pooled crude intraclass correlation coefficients were above 0.5 for energy and the macronutrients. With adjustment for energy, the pooled intraclass correlation coefficients for macronutrients ranged from 0.420 for n-3 PUFA to 0.803 for alcohol. For micronutrients, the pooled adjusted intraclass correlation coefficients ranged from 0.507 for iodine to 0.712 for vitamin B6 (Table 6.11). In studies examining short-term reproducibility (defined as ≥6mos interval between repeated administrations of the questionnaire), the median pooled intraclass correlation was 0.643 compared to 0.652 for studies examining longer-term reproducibility (>6 mos between administration).

Table 6.11 Pooled effect estimates and heterogeneity of the correlation coefficients for the reproducibility of FFQ for micronutrients.
ICC intraclass correlation coefficient, CI confidence interval, I2 Inconsistency index, N/A not available. Data from Cui et al. 2021.
CrudeEnergy adjustedCrudeEnergy adjusted
Nutrient N ICC (95% CI) I2 N ICC (95% CI) I2 N ICC (95% CI) I2 N ICC (95% CI) I2
Vitamin A 27 0.623 (0.544, 0.692) 95.2 12 0.597 (0.464, 0.705) 92.2 42 0.613 (0.570, 0.651) 87.2 22 0.553 (0.470, 0.627) 89.8
Retinol 18 0.589 (0.513, 0.656) 85.3 9 0.537 (0.421, 0.635) 74 49 0.573 (0.537, 0.607) 80.6 38 0.513 (0.460, 0.562) 84.3
Carotene 9 0.632 (0.499, 0.735) 97.1 5 0.512 (0.328, 0.658) 0.86 25 0.605 (0.558, 0.649) 89 21 0.510 (0.427, 0.584) 90.8
β-Carotene 21 0.677 (0.630, 0.719) 76.1 6 0.613 (0.456, 0.733) 81.9 39 0.613 (0.573, 0.649) 72.3 28 0.554 (0.513, 0.593) 56.5
Vitamin C 47 0.665 (0.600, 0.722) 96.1 22 0.635 (0.526, 0.723) 94.9 92 0.623 (0.594, 0.650) 85.3 57 0.596 (0.555, 0.633) 83.6
Vitamin D 16 0.678 (0.546, 0.777) 98.4 5 0.671 (0.391, 0.837) 98.1 30 0.617 (0.572, 0.659) 83.5 15 0.560 (0.475, 0.635) 81.5
Vitamin E 34 0.665 (0.576, 0.738) 97.5 15 0.606 (0.484, 0.704) 94.4 52 0.626 (0.583, 0.667) 91.4 30 0.555 (0.490, 0.613) 87
Vitamin K 3 0.656 (0.430, 0.804) 97.4 2 0.693 (0.652, 0.729) 0 7 0.602 (0.511, 0.679) 60.4 5 0.658 (0.553, 0.742) 32.7
Thiamin 31 0.630 (0.587, 0.670) 87.3 12 0.606 (0.492, 0.699) 93.1 55 0.606 (0.579, 0.633) 74.3 39 0.522 (0.475, 0.566) 79.5
Riboflavin 28 0.667 (0.616, 0.712) 91.7 10 0.619 (0.483, 0.726) 94.7 54 0.640 (0.611, 0.667) 81 35 0.581 (0.528, 0.628) 85.4
Niacin 22 0.667 (0.609, 0.718) 89.4 10 0.605 (0.499, 0.693) 90.7 39 0.643 (0.573, 0.704) 94.3 34 0.517 (0.452, 0.576) 86.2
Vitamin B6 13 0.723 (0.522, 0.847) 98.4 5 0.712 (0.516, 0.838) 96.6 27 0.610 (0.553, 0.662) 78.8 19 0.555 (0.483, 0.619) 77.2
Folate 25 0.637 (0.582, 0.686) 90.5 6 0.597 (0.495, 0.684) 76.4 49 0.612 (0.577, 0.646) 81.6 26 0.605 (0.544, 0.659) 82.5
Vitamin B12 13 0.678 (0.507, 0.797) 97.7 7 0.683 (0.496, 0.809) 96.8 28 0.635 (0.577, 0.686) 82.3 21 0.575 (0.490, 0.648) 87.5
Se 11 0.661 (0.608, 0.709) 69.3 4 0.586 (0.429, 0.709) 78.7 15 0.648 (0.586, 0.702) 82.4 11 0.568 (0.446, 0.670) 87.6
Mg 19 0.674 (0.612, 0.728) 88.4 6 0.617 (0.492, 0.717) 87.5 30 0.669 (0.603, 0.725) 89.9 19 0.629 (0.544, 0.701) 86.8
Ca 52 0.635 (0.588, 0.676) 91.8 23 0.642 (0.566, 0.708) 91 87 0.622 (0.594, 0.649) 83 55 0.586 (0.545, 0.626) 84.1
Fe 39 0.640 (0.581, 0.692) 93.9 19 0.564 (0.496, 0.625) 79.4 75 0.613 (0.582, 0.642) 83.8 47 0.570 (0.525, 0.612) 82.8
I 2 0.499 (0.338, 0.632) 73.7 3 0.507 (0.421, 0.585) 35 2 0.828 (0.724, 0.894) 19.8 1 0.744 (0.600, 0.841) N/A
Zn 26 0.595 (0.556, 0.631) 68.8 12 0.573 (0.495, 0.641) 72.3 26 0.623 (0.565, 0.675) 85.3 18 0.597 (0.507, 0.675) 86.7
Cu 4 0.658 (0.620, 0.693) 0 1 0.690 (0.646, 0.728) N/A 6 0.748 (0.620, 0.837) 86.6 5 0.726 (0.608, 0.813) 86.6
K 25 0.672 (0.598, 0.736) 95.4 7 0.637 (0.486, 0.752) 93.6 49 0.637 (0.605, 0.667) 80 34 0.608 (0.566, 0.647) 73.4
P 23 0.605 (0.521, 0.676) 91.9 9 0.635 (0.544, 0.711) 80.7 43 0.621 (0.575, 0.662) 83.4 30 0.579 (0.521, 0.630) 82.1
Na 25 0.652 (0.499, 0.766) 98.2 8 0.670 (0.474, 0.802) 97 41 0.623 (0.582, 0.661) 83.9 30 0.552 (0.489, 0.609) 86.7
Mn 2 0.621 (0.382, 0.781) 64.8 N/A N/A N/A 5 0.655 (0.596, 0.707) 0 2 0.719 (0.645, 0.779) 0

Cui et al. 2021 found heterogeneity across studies to be high for energy and most macronutrients and micronutrients, underscoring that reproducibility varies across questionnaires, as well as dietary components. There was some variation in intraclass correlations by sex, and the pooled median intraclass correlation was higher for adults aged 18 to 50y (0.671) compared to adolescents (0.524) and adults >50y (0.659) (Figure 6.2).

Figure 6.2
Figure 6.2. Reproducibility of food frequency questionnaires (FFQ) stratified by age. Values represent pooled intraclass correlation coefficient (ICC) and spearman correlation coefficient (SCC), with 95% confidence intervals. The results of ICCs were present in (A) and the results of SCCs were present in (B). From Cui et al. 2021.

Reproducibility of food frequency questionnaires administered to children may be lower because of difficulty conceptualizing the time frame used and averaging consumption frequencies and portion sizes over time (Frank 1994). Several food frequency questionnaires have been developed and tested for use with children (Hammond et al. 1993; Huybrechts et al. 2008; Buch-Andersen et al. 2016; Mouratidou et al. 2019; Hafizah et al. 2019), and adolescents (Rockett et al. 1997; Ambrosini et al. 2009; Overby et al. 2914; Bjerregaard et al. 2018; Larroya et al. 2024). In a study in Italy by Filippi et al.  2014, 185 adolescents aged 14-17y completed a 106‑item web-based semi-quantitative questionnaire twice, one month apart. For nutrients, the energy-adjusted intraclass correlation coefficients ranged from 0.23 for cholesterol to 0.73 for ethanol. Intraclass correlation coefficients ≤0.40 were observed for 11 of 24 food groups (Table 6.12). Table 6.12 also shows lower and upper levels of agreement obtained using the Bland-Altman statistical method  (Section  6.3.2).

Table 6.12. Intraclass correlation coefficients, exponentiated mean difference and 95% limits of agreement (LOA) of food groups daily intake, performed on transformed, energy-adjusted data. From Filippi et al.  2014
Food groups ICC Mean difference (%) P-value t test Lower limit (%) Upper limit (%)
Vegetables (g) 0.46 99.97 0.952 89.68 111.44
Fresh  (g) 0.54 100.25 0.654 87.74 114.55
Dried fruit (g) 0.03 100.00 0.857 99.51 100.50
Nuts (g) 0.22 100.00 0.797 99.92 100.08
Legumes (g) 0.14 99.97 0.803 97.36 102.66
Breakfast cereals (g) 0.27 99.97 0.639 98.35 101.62
White   (g) 0.34 99.94 0.765 95.55 104.54
Bread substitutes (g) 0.21 100.06 0.544 97.62 102.57
Pasta/rice/couscous (g) 0.36 100.10 0.606 95.57 104.85
Potatoes (g) 0.37 99.98 0.889 95.83 104.29
Sweets (g) 0.43 99.89 0.639 94.35 105.75
Cheeses/yogurt (g) 0.41 99.89 0.614 95.07 104.97
Fishery products (g) 0.40 99.84 0.152 97.19 102.56
Meat (g) 0.41 99.56 0.040 94.61 104.76
Eggs (g) 0.37 100.02 0.557 99.22 100.83
Animal fats (g) 0.44 100.00 0.827 99.90 100.10
Oils (g) 0.23 99.94 0.398 98.34 101.57
Savoury food (g) 0.41 100.23 0.475 92.81 108.24
Water (ml) 0.47 102.05 0.611 39.63 262.83
Soft 0drinks (ml) 0.49 100.95 0.241 83.48 122.06
Fruit juice (ml) 0.41 100.59 0.274 88.64 114.15
Milk (ml) 0.56 100.50 0.318 89.30 113.11
Tea/coffee (ml) 0.56 100.09 0.649 95.38 105.04
Alcoholic drinks (ml) 0.51 99.73 0.286 93.99 105.8

Variation in reproducibility may come about not only due to the population group and dietary component but also the questionnaire design. In their meta-analysis, Cui et al.  2021 found that pooled intraclass correlations for longer questionnaires (>120 items) ranged from 0.512 to 0.825 compared to a range of 0.310 to 0.764 for shorter questionnaires. Additionally, the median pooled intraclass correlation was somewhat smaller for questionnaires focused on a period shorter than 12mos (0.622) compared to those aiming to capture intake over 12mos or more (0.659). Intraclass correlations were similar for interviewer- and self-administered questionnaires (Cui et al. 2021).

Additional design factors potentially influencing reproducibility include the range of response options for consumption frequency and portion size. If less variability is permitted in these two response categories, correlations between repeat administrations of the food frequency questionnaire will be higher. The adequacy of instructions to the respondent can also influence reproducibility. For example, for paper-based questionnaires, optical scanning can reduce coding errors and thus improve the apparent reproducibility (Ocké et al. 1997b). With web-based questionnaires, automated skip patterns and coding may reduce errors.

6.2 Sources of true variability in nutrient intakes

If measurement errors are minimized using the strategies described in Chapter 5, the reproducibility of any method for assessing the nutrient intake at the group or individual level is a function of the overall true variability in intake. This is determined by the between- and within-person variation. In any dietary assessment procedure, replicate observations are not possible; hence, as noted previously, within-person variation cannot be separated statistically from the measurement errors described in Chapter 5. Nevertheless, if the quality-control procedures outlined in Section 5.2 are used, the confounding effects of measurement error on within-person variability will be small. Both within- and between-person variability can be estimated statistically using analysis of variance, as long as two or more days of intake are available on at least a subsample of the population. Hence, within- and between-person variability cannot be calculated when a food frequency questionnaire or a dietary history is used.

Other sources of variance, discussed below, can also be estimated using analysis of variance techniques, provided that the initial design of the dietary assessment protocol allowed for their occurrence. Selecting the correct initial study design is thus of great importance.

6.2.1 Between-person variation

Individuals differ from each other in their usual daily food intake. The between-person variation is a measure of these differences, which, in turn, depend on the nutrient and the characteristics of the study group. If between-person variation is large relative to within- person variation, individuals can be readily distinguished so that the usual nutrient intakes of individuals can be characterized.

Unfortunately, for most nutrients, there is more variability in nutrient intakes within individuals than between them (Beaton et al. 1979; 1997). Hence, the within- to between-person variance ratio is greater than 1.0. This partly explains why the mean intake of a group can usually be assessed more readily than the usual mean intake of an individual. To allow for the effect of between-person variation on group mean nutrient intakes, the sample size should be as large as possible, and representative of the group to be studied.

6.2.2 Age and sex effects

Variation in nutrient intakes resulting from age and sex differences of the participants contributes to between- person variation. As a result, energy and nutrient intakes should typically be presented separately by sex. If the variation in age is very large, the results will also need to be broken down into several separate age groups.

6.2.3 Within-person variation

The within-person variance is a measure of the true day- to-day variation in the dietary intake of an individual. When estimated by analysis of variance, it represents the sum of true variation in the day-to-day intake of a person, plus all remaining random variation, including measurement error, that remain in the data set. The within-person variance (s2w) may be expressed as a standard deviation (sw) or as a coefficient of variation (CVw), where

CVw = sw / (mean level of intake)

The only way to reduce the effect of the within-person variance on the mean daily intake is to increase the number of measurement days for each person in the group. The number of measurement days required depends on the desired precision of the estimate.

Measurement days should represent the population of days to be studied. For example, the days can be closely spaced to eliminate bias from seasonal changes, and the days should be selected such that weekend days as well as weekdays are proportionately included. Most researchers suggest that nonadjacent days should be chosen to avoid the effects of autocorrelation of consecutive daily intakes (National Research Council 1986; Hartman et al. 1990; Institute of Medicine 2000).

Minimizing within-person variation by increasing the number of measurement days does not affect the between-person variation; it simply enables the usual intake of individuals to be characterized more precisely.

Within-person (and between-person) variation is a function of the nutrient of interest. Generally, for nutrients found in high concentrations in a few foods that are consumed only occasionally, such as vitamin A, vitamin D, sodium, cholesterol, and linoleic acid, within-person variation is high, making it more difficult to obtain precise estimates of the usual intakes of these nutrients for individuals; conversely, within-person variation is lower for nutrients found in many foods, such as carbohydrate and protein (Gibson et al. 1985).

The extent of within-person variation recorded during dietary assessment depends, in part, on the variety versus monotony in the diets of an individual. As described above, the ratio of within- to between-person variation may be lower in lower-income compared to high-income countries, which may be attributable to less variety in intake (Gibson et al. 2017). For example, in Indonesia where a more limited number of foods is consumed, and where the consumption is more closely linked with income than with food availability, within- person variation is less than between-person variation (Persson et al. 2001). In contrast, in Malawian female subsistence farmers, higher within- than between- person variance ratios were reported, and these higher ratios were attributed to the occasional and irregular use of animal foodstuffs (Nyambose et al. 2002).

As described earlier, within-person variation may also be lower than between-person variation in young children. Erkkola et al.  2011, analyzing data from three consecutive food records for >1600 children participating in a birth cohort in Finland, found that the ratio of within-person variance was generally smaller than between-person variance for 1y old children. The ratios of within- to between-person variation were therefore less than 1.0 for most nutrients, except for cholesterol, vitamin A, β-carotene, and vitamin B12. Similar findings were observed for children aged 13-32mos based on a cross-sectional study in Brazil (Padilha et al. 2017). However, among 3y and 6y children, Erkkola et al. 2011, found that the ratios were greater than 1.0 for most nutrients examined. Others have noted that the within-person variation is larger than the between-person variation in older children (Ollberding et al. 2014; Caswell et al. 2020). For example, ratios ranging from 4.5 for vitamin B6 to 31.3 for vitamin C were observed based on 24h recall data for 200 4-8-y-old children in Zambia (Caswell et al. 2020). The ratios were generally slightly lower after adjusting for season, interviewer, and whether the recall period fell on a market day. Variance ratios were observed to be higher than 1.0 and similar by racial/ethnic identity among adolescents aged 12y to 17y participating in the U.S. National Health and Nutrition Examination Survey, while high variability was observed for children aged 6 to 11y (Ollberding et al. 2014). These and other estimates of the within-person variation in nutrient intakes are crucial to calculating the number of days needed to characterize average usual intake with a specified level of precision.

Physiological factors may also influence within-person variation. Studies have shown that energy intakes vary across the menstrual cycle. In a narrative review, Rogan and Black 2023 note that despite the lack of high-quality research and heterogeneity across studies, energy intake appears to be higher in the luteal phase of the menstrual cycle, with the lowest intake likely during the late-follicular and ovulatory phases. The authors note that most studies reported differences of around 200‑350kcal/day between time points in the menstrual cycle. They also note that differences in energy intake likely differ between individuals and between cycles. Similarly, based on a meta-analysis of 15 studies, Tucker et al. 2025 found higher energy intake, averaging 168kcal/day, in the luteal compared to the follicular phase.

As stated previously, the within-person variation is usually larger than between-person variation. Consideration of within-person variation is particularly important for assessing the prevalence of inadequate intakes in a population group (Level two studies, Section 3.3.2), for ranking individuals within a group (Level three studies, Section 3.3.3), or when data on usual intakes of individuals are required for correlation or regression analysis with biochemical or clinical parameters at the individual level (Level four studies, Section 3.3.4) (Carriquiry 1999).

Prevalence estimates of inadequate intakes for a specific nutrient are influenced by within-person variation, particularly when the observed distribution of nutrient intakes is based on a single measurement for each person (e.g., one 24h recall). In such cases, the mean or median intake for the group may be adequately estimated, but the within-person variability distorts estimates of the percentiles above and below the mean by increasing the total variance of the distribution. As a result, the distribution of observed intakes is wider and flatter than the distribution of usual intakes.
Figure 6.3
Figure 6.3. Prevalence estimates of excessive and inadequate intakes in both 1d and multiday surveys of the same population. The prevalence estimated from multiday surveys (dark-shaded areas) is much less than the prevalence estimated from the 1d survey.
Figure 6.3 shows the effect of using a single day's intake versus multiple days to characterize usual intakes of individuals. In this example, estimates based on a single day's intake will exaggerate the prevalence of both inadequate and excessive intakes in the population (Beaton 1982). The extent of the bias in the prevalence estimates depends on the within-person variation in intake; it cannot be diminished by increasing the sample size. Kirkpatrick et al. 2022 discuss considerations in using data from 24h recalls to make inferences about usual intakes.

Disease-diet relationships tend to be obscured and appear less significant if within-person variation in nutrient intake is ignored. Correlations at the individual level between diet and disease are lowered by within-person variation. The theoretical reduction in the absolute value of the correlation coefficient can be calculated from the ratio of the within-person to between-person variance and the number of replicate observations, provided that the sample size is large (i.e., >100). For example, if the observed variance ratio is 2.0, as determined from three separate dietary intake measurements (such as three 24h recalls), the correlation coefficient (r) between the estimated intake and some biochemical parameter is 77% of the true correlation, based on the theoretical attenuation factor from Anderson  1988. Hence, the calculated correlation coefficient can be corrected by dividing by 0.77 before testing the significance of the r value. However, with small sample sizes (i.e., <100), this correction is not advised because the sampling error associated with the correlation coefficient may be too large.

Attenuation may also reduce the significance of regression. Attenuation factors corresponding to different ratios of within- to between-person variance and different numbers of measurements are also available for simple linear regression. As an example, if the variance ratio equals 2.0 and three measurements of dietary intake (e.g., three 24h recalls) are used, the regression coefficient of a biological variable on the estimated value of the dietary factor is 60% of the true coefficient. Such a correction must be made with caution, as noted for correlation coefficients.

Finally, within-person variation and the use of data from a limited number of record or recall days will result in errors when the individuals are classified into terciles, quartiles, or quintiles, based on their nutrient intakes (Anderson 1988). This classificatory approach is frequently used to examine associations of dietary intake and chronic disease (Section 3.3.3). For example, relative risks can be computed for each of the four lower quintiles by treating the uppermost quintile of intake as the reference quintile (Table 3.4) (Yuan et al. 2019). However, because of misclassification, the relative risks are attenuated, and it becomes more difficult to detect the strength of association between diet and disease. Only when the classification is based on sound estimates of the usual daily intake of the nutrient will the relative risks be more meaningful.

6.2.4 Day-of-the-week effects

Group mean nutrient intakes per day and individual usual intakes per day may both vary with the day of the week. For example, using 7d food record data from the Danish National Survey of Diet and Physical Activity 2011-2013, Nordman et al. 2020 observed higher intakes of energy, added sugars, discretionary foods, alcohol, and beer and wine, as well as higher energy density on Friday and Saturday/Sunday compared to weekdays. In contrast, intakes of fruit, vegetables, fiber, and wholegrain products were higher on weekdays compared to Friday and Saturday/Sunday. Minor differences were observed by gender, whereas variation in intake from weekdays to weekends was most evident for children and least evident for older adults. Among a sample of 79 middle-aged White women who completed seven 24h recalls over 14d, Ma et al. 2009 found that energy intake was lower on Friday compared to Sunday (Table 6.13). Similarly, Beaton et al. 1979 demonstrated that women, not men, ate more food on Sundays than weekdays. However, such sex differences have not always been observed. For example, van Staveren et al. 1982 demonstrated that both sexes had lower intakes of dietary fiber on weekends.

Table 6.13. Unadjusted and adjusted energy estimates by call sequence and day of the week: Results of linear mixed models, The Energy Study (N = 79), Worcester, Massachusetts, June-October 1997. From Ma et al. 2009.
N Unadjusted energy
intake Mean (SE)
p Value Adjusted energy
intake Mean (SE)
p Value
Call sequence
1 79 1672.3 (57.2) Reference 1500.9 (200.9) Reference
2 79 1865.4 (84.9) 0.02 2246.4 (156.5) 0.007
3 79 1907.7 (84.4) 0.003 2315.3 (139.5) 0.001
4 79 1946.6 (83.3) 0.005 1704.3 (200.5) 0.52
5 79 1853.4 (73.0) 0.04 1667.7 (189.4) 0.56
6 79 1716.6 (73.6) 0.62 1513.1 (193.6) 0.97
7 79 1817.7 (69.5) 0.11 1831.7 (197.3) 0.25
Days of the week
Sunday 72 1937.0 (73.1) Reference 1906.1 (83.2) Reference
Monday 80 1751.0 (76.5) 0.16 1858.0 (79.5) 0.59
Tuesday 109 1721.0 (62.2) 0.01 1813.1 (75.3) 0.27
Wednesday 69 1674.7 (78.2) 0.007 1786.8 (76.9) 0.19
Thursday 53 1887.6 (87.4) 0.55 1915.9 (80.4) 0.99
Friday 84 1922.5 (76.0) 0.04 1746.0 (64.5) 0.04
Saturday 86 1922.8 (81.2) 0.61 1797.5 (73.2) 0.14

Not all nutrients exhibit a weekend effect: for those for which there are large within- and between-person fluctuations in daily intakes (e.g., cholesterol, vitamin A, and sodium), a weekend effect may not always be evident (Gibson et al. 1985). Maisey et al.  1995 however, reported higher intakes on Sundays of vegetable-derived micronutrients in their population of older adults, especially for the intakes of carotene, retinol equivalents, folate, vitamin C, pantothenate, and zinc. The effect of day of week is particularly pronounced for alcohol. Nordman et al. 2020 found that mean alcohol intake was higher by 94%, 81%, and 42% on Saturdays compared to weekdays among individuals aged 16‑24y, 25‑59y, and 60‑75y, respectively.

The weekend effect on nutrient intakes sometimes disappears when nutrients are expressed in terms of nutrient densities (Beaton et al. 1979; Gibson et al. 1985). This finding suggests that the food consumption patterns are comparable for weekdays and weekend days, but that total energy intakes differ. Again, this is not always the case. In the study of Maisey et al. 1995, the higher intake of vegetable-derived micronutrients on Sundays was still observed when these nutrients were expressed in terms of the nutrient density. As noted above, Nordman et al.  2020 observed higher energy density on Friday and Saturday/Sunday compared to weekdays.

Day-of-the-week effects can be accounted for by representing all days of the week in the study design (Sempos et al. 1985). It is probably not sufficient to simply proportionately include weekend days and weekdays, especially for studies of older adults, whose intakes for some nutrients may vary over the course of the week (Maisey et al. 1995). In such cases, it is preferable to include each day of the week equally in the final study design. A variable indicating the day of the week or weekdays versus weekend days can be included in models to account for this nuisance effect (Nusser et al. 1995).

6.2.5 Seasonal effects

The effects of the different seasons of the year on food or nutrient intake depend on the population group, its socioeconomic status, and the country. Accordingly, the findings of studies on seasonality of intakes are inconsistent. Findings have also changed over time with increasing globalization of the food supply.

Research conducted from the 1980s to early 2000s suggested small seasonal effects on energy intakes in higher income countries (Sempos et al. 1984; Kim et al. 1984; van Staveren et al. 1986; Hartman et al. 1990; Palaniappan et al. 2003), and marked effects in lower income countries (Ross et al. 1986; Kigutha 1997). Intakes of certain nutrients, such as vitamin A, vitamin C, iron, and in some cases fat, appeared to show seasonal variation in both low-income and higher-income countries (van Staveren et al. 1986; Hartman et al. 1990; Kigutha 1997). Seasonal differences in intake of some foods and nutrients have also been observed in studies of children conducted in Kenya, Ghana, and Malawi (Ferguson et al. 1993; Gewa et al. 2007).

More recent evidence is less convincing of seasonal variation in dietary intake, at least in high income settings. Based on a review of 20 studies conducted mostly in high income countries, Fujihira et al.  2023 concluded there were similar but inconsistent findings across studies indicating that energy intake is higher in winter and spring and lower in summer. The authors posit that variation in energy intake across seasons is influenced by environmental, social, and physiological factors. However, Marti-Soler et al. 2017 examined energy and macronutrient intake among over 40,000 participants in three countries in the northern hemisphere (France, Russia, Switzerland) and one in the south (New Zealand) and found that most nutrients did not vary significantly by season. Using data from Switzerland, the authors considered three time periods — 1993-1999, 1993-1999, and 2006-2012 — and found that seasonal variation declined over time (Marti-Soler et al. 2017). The authors concluded that seasonality is study-specific, with significant effects due to large sample sizes in some cases, and that the magnitude of seasonal variation is flattening with time. In a meta-analysis of 10 studies conducted in Japan (Adachi et al. 2025), seasonal variations in intakes of energy, nutrients, and foods were also inconsistent. Seasonal differences were observed for vegetables (higher in summer), fruits (higher in fall), and potatoes (higher in fall and winter), though heterogeneity across studies was high (Adachi et al. 2025).

The effect of season may be more considerable in low- income countries. Considering eight studies that reported energy intake for the pre-, post-, and harvest seasons, Stelmach-Mardas et al. 2016 observed that energy intake was highest in the post-harvest season. In a study conducted in Western Kenya, Waswa et al. 2021 found that season influenced the dietary diversity of women but not their children (aged 6 to 23mos), with higher diversity among women in November (post-harvest) compared to July/August. Among the women, differences in energy and macronutrient intake were not observed, but iron, calcium, and vitamin E intake varied by season. Mitchikpe et al. found that food patterns among children aged 6 to 8y in Benin differed by seasons, but corresponding differences in energy and nutrient intakes were not observed, with the exceptions of fat and vitamin C (Mitchikpe et al. 2009). However, in a study of children aged 4 to 8y in Zambia, Caswell et al. 2020 found seasonal effects of differing magnitudes for energy and all 14 nutrients examined using up to 7 recalls per child collected over 6mos. The contribution of the seasonal effect to total variance ranged from 2.9% for fat to 23% for riboflavin, with adjustment for the interviewer and whether the recall period fell on a market day (Table 6.3).

A seasonal effect can be considered by administering the survey over a long interval of time (e.g., 1y) and including randomly selected days representative of all seasons of the year (Hartman et al. 1990; Ocké et al. 1997b). If this is not done, investigators should consider the applicability of results obtained at one season to the rest of the year, depending on the context and population. Though season can potentially be accounted for in analyses, Marti-Soler et al. 2017 caution that no standard adjustment can be proposed, as an adjustment suitable for one study or nutrient might bias others. Based on the examination of differences in diet quality across seasons based on food frequency data collected from 1993 to 1998 in the U.S.-based Women's Health Initiative, Crane et al. 2019 similarly concluded that the differences were not sufficiently large to warrant adjustment for season as a confounder when examining diet quality.

6.2.6 Sequence effect

Individuals may react to repeated interviews, showing a sequence or training effect that may result in changing reported nutrient intakes over time (Frank et al. 1984). This effect may be severe if individuals complete the recall or records on consecutive days (Sharma et al. 1998). The presence or absence of a sequence effect on group mean or individual intakes can be assessed by completing the interviews or records on randomly selected days of the week and recording their order.

Ma et al.  2009 found that estimated daily energy intake was lowest for the first 24h recall compared to subsequent recalls, regardless of the day of the week on which the first recall was completed. Relatedly, in a study of 11 pairs of participants designed to examine how different persons recalled eating the same foods, Novotny et al. 2001) found that the number of consumed foods omitted declined slightly from the first 24h recall to the second, suggestive of a training effect. However, significant effects were not observed when the analysis was stratified by sex. Other investigators have found no evidence of a sequence effect (Gibson et al. 1985; Maisey et al. 1995). Nonetheless, analyses, including estimating distributions of usual intake using data from multiple 24h recalls, often account for the sequence of the recalls as a nuisance effect  (Nusser et al. 1995; Kirkpatrick et al. 2022).

6.2.7 Illness or dieting

The existence of chronic illness or dieting in certain subgroups of the population may adversely affect dietary intakes. This may lead to a bias in the prevalence of inadequate intakes in a population presumed to be normal and healthy (Van Staveren et al. 1994).

6.3 Statistical assessment of reproducibility

The following sections provide a brief account of the statistical methods for assessing reproducibility. Readers are advised to consult a standard statistics text for further information as well as to consult with statisticians to inform study design and data analysis.

Reproducibility is affected by between- and within- person variation, as noted earlier, and investigators should aim to design the dietary assessment protocol in such a way that these two sources of variability can be separated and estimated statistically. This can be achieved by using analysis of variance techniques (see below). Correlation coefficients can be used to assess the relative ordering of individuals by the repeated measurements (Willett 2013). Absolute levels may be compared, for example, through examination of means and standard deviations (Willett 2013). As different statistical tests provide different information, a combination of approaches may be used to lend confidence to the interpretation of findings from a given study.

6.3.1 Analysis of variance

Analysis of variance (ANOVA) assesses the differences, if any, in the group mean intake of each nutrient between the replicates, and it can be used to identify and estimate between- and within-person variability. The variance ratio (the ratio of within-person to between-person variation) can then be calculated. Variance ratios depend critically on the nutrient, sample size, number of measurement days per individual, dietary methodology, and probably age, sex, and sociocultural group (National Research Council 1986; Hartman et al. 1990) as described earlier. Hence, when comparing variance ratios from different studies, these factors should be taken into consideration.

Once estimates of the between- and within-person variability have been calculated using analysis of variance, they can be used in the following equations to calculate, based on the standard error of the mean: (a) the reproducibility of the dietary method to estimate the usual intake of a group and (b) the reproducibility of the dietary method to estimate the usual intake of an individual (Cole and Black 1984).

equations

6.3.2 Mean and standard deviation of the difference

Bland and Altman  1986b recommended the use of the mean and standard deviation of the differences between the two replicates for comparing nutrient intakes at an individual level. They also suggested calculating the 95% confidence limits (i.e., mean difference ±2SDs) for the difference between the two replicates. They refer to these as the limits of agreement (LOA). A judgment can then be made as to whether the agreement reached between the two replicates is acceptable. Calculating the mean difference provides information about the direction of bias. A plot of the individual differences against the mean level of intake can indicate if the bias is constant across levels of intake. Calculating the LOA provides a way of assessing the differences between the measurements.

Figure 6.4 illustrates the use of the Bland-Altman method (Bland and Altman 1986a) for comparing estimated intakes of legumes, oils, meat, and savory food obtained from repeat administrations of a web-based food frequency questionnaire among adolescents (Filippi et al. 2014). The solid horizontal lines indicate the mean difference (percentage) between the two measures and the broken horizontal lines indicate the lower and upper LOA. The exponentiated mean difference and LOA represent the ratio of intakes estimated by the two questionnaires. Agreement was considered acceptable if the LOA ranged from 50% to 200%. The analysis was performed on log-transformed, energy-adjusted data. Table 6.12 shows the exponentiated mean difference and 95% LOA of a broader range of food groups. The LOA were within 50% and 200% for food groups, apart from water.

Figure 6.4
Figure 6.4. Bland-Altman plots for the reproducibility analysis of legumes, oils, meat and savory food. The solid horizontal lines indicate the mean difference (percentage) between the two measures and the broken horizontal lines indicate the lower and upper Limits of Agreement. From Filippi et al. 2014.

The application of the Bland-Altman method to dietary data is not without criticism because of challenges in interpretation (Willett 2013), but it is used widely in the literature to examine consistency between repeated measures. Giavarina  2015 provides guidance on the Bland-Altman method that can be used by researchers to ease use and interpretation.

6.3.3 Degree of misclassification

Assessment of the degree of misclassification is the simplest method of quantifying the extent of agreement on an individual basis. This approach is often used for qualitative food frequency questionnaires in which the data have been classified according to frequency of food use. The percentage of pairs with exact agreement or within a selected number of units is calculated.

Alternatively, if a semiquantitative food frequency questionnaire has been used, permitting nutrient intakes to be calculated, they can be classified into quartiles, for example, to assess the ability of a food frequency questionnaire to assign individuals to the same quartile of intake on both occasions. The percentages correctly classified into the same quartile, extreme quartiles, within one quartile, as well as the percentage grossly misclassified (i.e., classified into opposite quartiles) can then be calculated. This approach, however, ignores the fact that a certain amount of agreement invariably occurs by chance alone. This limitation can be overcome by using Cohen's weighted kappa statistic (Cohen 1968).

6.3.4 Paired tests on the mean or median intake

Paired t-tests or the nonparametric Wilcoxon matched-pairs signed-rank test for nonnormally distributed data are commonly used to assess agreement between nutrient intakes on a group basis. No significant difference between the group mean or median intakes for the two sets of data is taken to indicate satisfactory agreement, and hence reproducibility.

The confounding effect of within-person variation on usual nutrient intakes is not considered when a paired t-test or the Wilcoxon's signed-rank test is used. When the within-person variation is large relative to the between-person variation, the power of the t-test will be reduced. As a result, non-significant differences in group mean intakes may not necessarily indicate good reproducibility but the confounding effect of large within-person variation, apparent in a large coefficient of variation (National Research Council 1986).

6.3.5 Correlation analysis

Pearson's product moment correlation coefficients for normally distributed data or Spearman's nonparametric rank correlation coefficients and/or intraclass correlation coefficients are often calculated to assess reproducibility on an individual (within-pair) basis. Intraclass correlation coefficients (ri) correct for the number of chance expected agreements. High correlation coefficients relating nutrient intakes measured on two separate occasions are taken as indicative of good overall agreement between the two sets of nutrient data (cké et al. 1997b).

Parametric and nonparametric correlation coefficients quantify the extent of the linear trend relating the two sets of results, and not agreement. Additionally, sources of bias in one of the replicates may not be revealed by correlation analysis. For example, assume that results for the second replicate were exactly 10% higher than those obtained on the first occasion. Analysis will indicate perfect correlation (r = 1.0) between the two replicates, but there is far from perfect agreement. Altman et al. 1983 stressed that correlation coefficients cannot be judged on a null hypothesis basis of no correlation because there is an a priori reason to believe that the results correlate. People tend to eat similar foods from day to day; some agreement is to be expected.

Notwithstanding the frequent use of correlation analysis, caution is needed when it is applied to evaluate the extent of the agreement in a test-retest design for measuring reproducibility. If the line of equality is drawn, the plot of the test against the retest results can be useful in indicating bias and the presence of outliers. However, the numerical value of r should be interpreted with caution and other statistical measures used to assess reproducibility (see analogous discussion in relation to validity in Section 7.5).

6.4 Summary

Reproducibility of dietary surveys refers to the extent to which a specific dietary method used repeatedly in the same situation gives similar results. In general, reproducibility of a dietary assessment method depends on the time frame of the method, the population group under study, the nutrient of interest, the technique used to measure the foods and quantities consumed, and the between- and within-person variances. True reproducibility cannot be measured in dietary assessment because nutrient intakes vary daily. Instead, it is conventionally estimated using a test-retest design, followed by an assessment of the extent of the agreement between the nutrient intakes obtained on the two separate occasions, by the same method.

Reproducibility studies suggest that the 24h recall and dietary histories over a short time frame can provide a relatively reproducible estimate of the average usual intake for most nutrients for a large group, but not for individuals. The reproducibility of qualitative food frequency questionnaires for classifying individuals according to the frequency of use of certain foods or food groups depends on the frequency with which the foods are consumed. For foods consumed less frequently, reproducibility tends to be less than for those eaten frequently. The reproducibility of semiquantitative food frequency questionnaires depends on their design, the study group, and the nutrients under study. For some food frequency questionnaires, reproducibility may be high because their design limits the recording of variability in food and, hence, nutrient intakes.

Assessments of reproducibility should use a combination of statistical methods to lend confidence to interpretations. Statistical methods for assessing reproducibility on a group average (aggregate) basis using a test-retest design include paired tests on the mean (paired t-tests), or the median intakes (Wilcoxon matched-pairs signed-rank test). At the individual level, the simplest method for testing agreement is to calculate the percentage of misclassification by comparing the number of pairs, with exact agreement or within a selected number of units. For foods, this may be based on frequency of use or amount (in grams), whereas for nutrients, scores (intakes per 1000 kcal) may be used. Cohen's weighted kappa statistic should then be calculated. Additional methods for testing agreement between the results at the individual level include the mean and standard deviation of the differences between the two replicates, referred to as the limits of agreement (LOA) by Bland and Altman, who also suggest calculating the 95% confidence interval. Alternatively, correlation analysis can be performed, with the use of intraclass correlation coefficients (ri) preferred because they correct for the number of chance-expected agreements.

Acknowledgments

The author is very grateful to the late Michael Jory who, until recently, initiated the HTML design and then worked tirelessly to direct the transition to this HTML version from MS-Word drafts, James Spyker’s ongoing HTML support is much appreciated.