The results for the 7 domains of the SarQoL questionnaire in the overall sample show significantly higher SEM and SDC values compared to the overall score. These values seem to be roughly equal to the number of items in each domain. Considering the 3 domains with the smallest number of elements (D6: 2 elements; D3: 3 items; D7: 4 items), we find the most important SEM and SDC values, between 6.89 and 9.22 points for SEM and between 19.09 and 25.51 points for the SDC. This contrasts with the 4 domains with larger item numbers (D1: 8 elements, D2: 9 elements; D4: 14 items; D5: 15 elements), which have SEM values between 3.71 and 5.11 points and SDC values between 10.27 and 14.17 points. It is not surprising that a domain score based on a greater number of elements has greater accuracy and less variability, which is represented by the standard deviation of the difference between the results of tests and repeated tests. Typically, SEM is used to provide a measure of “absolute” reliability, while CCI provides a “relative” measure of reliability. The absolute nature of this allows you to quantify in real units how confident you are in a measure. The mean difference in the full sample for the sarQoL total score is 0.18 points (95% CI = -0.26; 0.63), showing that there is no systematic bias between the two questionnaire administrations because the confidence interval contains zero. The mean values of difference in the full sample for the 7 domains are not significant (95% CI contains zero) for domains 2, 3, 4, 5, 6 and 7, which in turn indicates the absence of systematic bias. One domain of the complete sample has a small but significant mean difference, namely domain 1 (0.86 points (0.04; 1.68)), indicating the presence of a very small systematic error. The full results of the Bland-Altman analysis are presented in Table 3.
A Bland-Altman graph for the total score in the full sample is provided as Fig. 1. Standard estimation error (SEest) is another form of SEm used in tests such as the Intelligence Scale for Children, 4th Edition (WISC-IV) changer. The SEest takes into account that values closer to the mean are likely to be more accurate than extreme values. The WISC-IV manual provides a table to interpret these values, which are unevenly distributed and therefore difficult to calculate, even for professionals. The SD of a large number of measurements from a single sample or a total of 278 sarcopenic subjects aged 77.67 ± 7.64 years and 61.5% of women were included. The SEM for the total SarQoL score ranged from 0.18 to 4.20 points for each study and was 2.65 points when all subjects were analyzed together. The SDC for the total number of points ranged from 0.49 to 11.65 points for individual studies and amounted to 7.35 points for all subjects.
The Bland-Altman diagrams showed no systematic errors in the questionnaire. The standard measurement error plays a complementary role to the reliability coefficient. Reliability can be understood as the degree to which a test is consistent, reproducible and reliable. The coefficient of reliability varies from 0 to 1: if a test is absolutely reliable, the total variance of the observed score is caused by the variance of the actual score, while in a test, the entire variance of the observed score is the result of an error. Although the reliability coefficient provides important information about the amount of error in a test measured in a group or population, it does not provide information about the error present in a single test result. As Jeremy pointed out, there are several versions of the ICC that reflect different ways of accounting for evaluators or the variance of elements in the overall gap. There is a good summary of the use of kappa and ICC indices for evaluator reliability in calculating inter-evaluator reliability for observational data: An overview and tutorial by Kevin A. Hallgren and I discussed the different versions of CCI in a related article. In short, you need to decide whether your reviewers are considered to come from a larger pool of potential reviewers or as a fixed group of reviewers. In the first case, this means the use of a random-effect model, while in the second case, the evaluators are treated as fixed effects. Similarly, elements can be treated as fixed or random units. Typically, we use a bidirectional random-effect model (reviewers and items are treated as random effects) to estimate relative match (or ICC(2,k) for absolute reliability if you are interested in systematic errors between reviewers).
The SEM can be calculated from the square root of the mean square error from a disposable ANOVA or the sample standard deviation, as suggested in the other response. Subjects from 9 studies (conducted in Belgium, Brazil, Czech Republic, England, Greece, Lithuania, Poland and Spain) were included. SEM, a measure of error in scores that is not due to actual changes, was calculated by dividing the standard deviation of the difference between test and retest scores (SDdiff) by √2. The TCS, defined as a change beyond the measurement error, was calculated by multiplying the SDdiff by 1.96. The Bland-Altman diagrams were examined for the presence of systematic errors. The Bland-Altman diagram provides a visual representation of the presence of systematic errors in an instrument. The Bland-Altman diagram is based on three variables: the mean systematic difference between test and repetition results (d ̄) and the upper and lower limits of agreement, which include 95% of the observations, assuming that the values of the difference between test and repetition results are normally distributed [18,19]. These variables are embedded in a scatter plot that defines the difference between the test and retest values on the Y axis and the average of the test and retest values on the X axis.
The present study, which analyzed a sample of 278 subjects from 9 validation studies, found a typical measurement error of 2.65 points and a smaller detectable change of 7.35 points for the overall SarQoL questionnaire score. These values can be applied in future longitudinal research to assess the accuracy of the measured changes. Standard measurement error (MEF) is a measure of the amount of measured test results distributed around a “true” score. The SEm is particularly meaningful for a candidate because it applies to a single score and uses the same units as the test. The main objective of this study is to determine the SEM and SDC of the SarQoL questionnaire in a sample of subjects from 9 international validation studies. The secondary objectives are to study the measurement error of the questionnaire using a Bland-Altman analysis and to update the results obtained previously for the test-retest reliability of the SarQoL questionnaire in the complete sample. The test-retest reliability of a questionnaire quantifies the extent to which a questionnaire gives the same results on repeated measures, provided that the health of the participants remains stable. .