Introduction

Chronic obstructive pulmonary disease (COPD) represents a significant cause of mortality and morbidity worldwide.1 According to the Global Initiative for Chronic Obstructive Pulmonary Disease (GOLD) strategy, the diagnosis of COPD is still based on spirometry, but symptom assessment is crucial for proper treatment decisions.2

From a clinician’s perspective, there is a need for a reliable and validated assessment tool that could improve communication between the physician and patient. The COPD Assessment Test (CAT) is recommended by the GOLD report as a short, simple questionnaire to measure the impairment of health status in COPD.3 The questionnaire comprises 8 items relating to cough severity, phlegm, chest tightness, breathlessness, activity limitation, confidence, sleep, and energy. The total score ranges from 0 to 40, with a score of 0 representing no impairment.3 The tool allows a comprehensive symptom assessment, and its importance is growing according to the current GOLD report. Thus, CAT has been widely used for assessing and monitoring COPD, and several language versions have been published and validated.4-7 The questionnaire is also available in a Polish-language version (see Supplementary material), and it has been widely used in clinical practice for the past 10 years. However, the tool has not been validated in Poland so far. Therefore, the aim of the study was to validate the Polish-language version of the CAT questionnaire by assessing its reproducibility and reliability, as only validated questionnaires should be recommended for research and clinical practice.

Validation of the original English version of the questionnaire was completed previously, and it was found that the tool had good measurement properties with excellent internal consistency.3 The Cronbach α coefficient for the questionnaire was very good (α = 0.88; P <0.05). The validation process of the original CAT questionnaire included an intraclass correlation as well as sensitivity to change.3 The original version was validated for specific items; in addition, selected clinical criteria were incorporated.

Patients and methods

Study design

The study was a substudy of the POPE project (Phenotypes of COPD in Central and Eastern Europe), an international multicenter observational cross-sectional survey of COPD patients in Central and Eastern European countries. During the study, the validation process of the CAT questionnaire was completed at different sites representing a range of Polish regions participating in the POPE study.7,8 The coordinating center was the Medical University of Silesia in Katowice. The study was performed in accordance with the principles of the Declaration of Helsinki. The Ethics Committee of the Polish coordinating center, after reviewing the study protocol, had decided that the ethical approval was not needed due to a noninterventional study design.

The methodology of the POPE study was described in detail elsewhere.7 Patients with a clinically confirmed diagnosis of COPD established at least 12 months before the visit were recruited to the study. The diagnosis was based on clinical data and irreversible airway obstruction on spirometry. The inclusion criteria were an age of 40 to 80 years and a stable course of disease for at least 4 weeks prior to the survey. Smoking history was not obligatory, and other exposure risk factors were also allowed. Comorbid diseases were allowed, but exacerbation of any medical condition was considered an exclusion criterion.

Validation of a newly developed measure requires a more complex statistical analysis. If an original questionnaire is translated into another language version, the literal or linguistic equivalence of both language versions does not ensure that they have the same psychometric properties. The consistency of a translated version of the questionnaire can be evaluated using its internal consistency and test–retest validity.9

The validation procedure of the translated Polish version of the CAT questionnaire involved complex methods applied in a reliability analysis10: 1) the analysis of statistical properties of test items (the assessment of internal consistency) and the relation of items to general test result, and 2) test–retest reliability, that is, the comparison of double tests with the same method (the estimation of the internal stability of the test).

The Polish version of the CAT questionnaire was self-completed by a patient during the visit in an outpatient pulmonary department. To assess test–retest reliability, the questionnaire was completed twice by each patient with 1 hour between each completion. There is no specific limitation for the period of time required between each completion of the questionnaire in the test–retest reliability procedure. Symptoms or complaints are expected not to change significantly during 1 hour in the case of a patient with stable COPD. The period of time is also long enough to ensure that the patient does not to replicate all answers chosen during the previous completion.

The language version questionnaire that was validated during the study was translated into Polish by GOLD Committee affiliates. The translation process was based on the standard procedures of questionnaire translation.10

Internal consistency

Testing for homogeneity of the measurement is an important procedure assessing the reliability of an instrument. Internal consistency is defined as the correlations between the items in the scale or within each scale domain or correlations between the items and the total score. Internal consistency is measured by applying the Cronbach α coefficient. The calculation is based on an average correlation among the items and the number of items in the instrument; thus, the coefficient reaches the values between 0 and 1.11,12

External validity

To assess external validity, the analysis also included an assessment of the association between CAT scores and the modified Medical Research Council (mMRC) dyspnea scale, as well as spirometry parameters: pre- and postbronchodilator spirometry (postbronchodilator forced expiratory volume in 1 second [FEV1, l and % predicted], postbronchodilator forced volume capacity [FVC, l and % predicted], and postbronchodilator ratio of FEV1 to FVC), as well as the 6-minute walk test (6MWT).

Test–retest reliability

A reliable measure means that it is stable or consistent and produces similar results when administrated repeatedly, when there is no evidence of change. To assess test–retest reliability, the instrument is administered to the same population on 2 occasions (stable over the interval between assessments) and the 2 scores are assessed for consistency. The results could be influenced by the possibility of practice effects, which can artificially inflate the estimate of reliability.13,14 The repeatability of the questionnaire was assessed using the Bland–Altman procedure and the Cohen κ statistical test.

Statistical analysis

Statistical analysis was performed using standard procedures available in the Statistica 12.0 software package (StatSoft Inc., New York, New York, United States) and SAS, version 9.2 (SAS Institute Inc., Gary, North Carolina, United States). The normality of distribution for continuous variables was assessed by the Shapiro–Wilk test. Statistical significance of differences between continuous variables was analyzed by the t test, and in the case of nonnormal distribution, the Mann–Whitney test and Wilcoxon signed-rank test for paired variables were used. Differences between categorical variables were examined by the χ2 test.

A level of internal consistency of the test was assessed by analyzing the correlation of answers to questions with a total score of the questionnaire on the basis of the Cronbach statistics. The raw and standardized Cronbach α coefficients were calculated (value scaling of variables—answers to questions with the assumption that the standard deviation equals 1). A satisfactory level of consistency was defined by the standardized value of α statistics exceeding 0.70. Apart from calculating general α statistics, the impact of separate questions on the consistency level of the questionnaire was defined by analyzing a potential improvement of the α value after a possible removal of subsequent questions from the questionnaire.

To assess the external accuracy of the tool, the analysis also included an assessment of the association between CAT scores and the mMRC dyspnea scale, 6MWT, and spirometry parameters.

The repeatability of the responses was assessed using the Bland–Altman procedure as well as the Cohen κ statistical test. The statistical agreement was determined with the conventional scale, with the κ values of 0.81 to 1.00 denoting almost perfect agreement; 0.61 to 0.80, substantial agreement; 0.41 to 0.60, moderate agreement; 0.20 to 0.40, fair agreement; and <0.20, slight agreement.15

The statistical inferences were based on the P level of significance of less than 0.05.

Results

Characteristics of the study group

The study group comprised 395 ambulatory patients with COPD (258 men and 37 women). The mean (SD) age of patients was 67.9 (9.7) years, with no significant differences between sexes (P ≥0.1). Most patients were former smokers (73.67%), followed by current smokers (20.76%), passive smokers (2.28%), and nonsmokers (3.29%). The only difference between men and women was a smoking history, with men representing significantly higher exposure measured as pack-years (Table 1).

Table 1. Characteristics of the study group
ParameterTotal (n = 395)Men (n = 258)Women (n = 137)P valuea

Age, y

67 (61; 75)

69 (60; 77)

66 (61; 73)

0.1

Disease duration, y

8 (3; 12)

9 (4; 13)

7 (2–12)

0.09

Age at diagnosis, y

59 (52; 67)

60 (52; 67)

59 (53; 65)

0.6

Smoking, pack-years

36 (23; 50)

39.50 (25; 54)

30 (20; 44)

0.0004

BMI, kg/m²

27.59 (24; 30.9)

27.57 (24.31; 31.24)

27.59 (23.07; 30.43)

0.2

6MWD, m

400 (327; 460)

400 (330; 460)

399 (322; 450)

0.9

mMRC dyspnea scale

2 (1; 3)

2 (1; 3)

2 (1; 3)

0.9

Postbronchodilator FEV1, % predicted

52.40 (39.48; 68.37)

51.56 (38.24; 67.9)

55.09 (44.39; 68.89)

0.07

FEV1/FVC ratio

0.5 (0.38; 0.6)

0.47 (0.38; 0.59)

0.53 (0.40; 0.62)

0.02

Data are presented as median (Q1; Q3).

a Mann–Whitney test

Abbreviations: BMI, body mass index; FEV1, forced expiratory volume in 1 second; FVC, forced vital capacity; mMRC, modified Medical Research Council; 6MWT, 6-minute walk test

Internal consistency reliability

The reliability analysis based on the baseline completion of the questionnaire, showed the Cronbach α raw coefficient of 0.87 and the standardized coefficient of 0.86. The impact of particular questions on the general level of consistency of the test defined by the values of α coefficients ranging from 0.84 to 0.86 is presented in Table 2. The results of partial correlation analysis showed that all correlations reached a value of 0.5 or higher and ranged from 0.5 to 0.74. Table 2 presents the correlation coefficients between the responses to particular questions and a general result of the test and the values of the Cronbach α raw and standardized coefficients after a possible removal of a given question from the questionnaire.

Table 2. Impact of the responses to particular questions of the Polish-language version of the CAT questionnaire on questionnaire’s reliability
QuestionCorrelation coefficientCronbach α coefficient
RawStandardized
1

0.50

0.86

0.86

2

0.54

0.86

0.86

3

0.52

0.86

0.86

4

0.67

0.84

0.84

5

0.72

0.84

0.84

6

0.74

0.84

0.84

7

0.61

0.85

0.85

8

0.68

0.85

0.85

Correlations between answers for individual questions were significant (P <0.05) and reached the expected values, ranging from 0.30 to 0.74 (Table 3).

Table 3. Spearman correlation coefficients for the answers to individual CAT questions
Question no.Question 1Question 2Question 3Question 4Question 5Question 6Question 7Question 8

Question 1

1.00

0.62

0.30

0.31

0.32

0.34

0.33

0.37

Question 2

0.62

1.00

0.36

0.37

0.34

0.40

0.39

0.34

Question 3

0.30

0.36

1.00

0.38

0.37

0.45

0.46

0.33

Question 4

0.31

0.37

0.38

1.00

0.74

0.62

0.44

0.56

Question 5

0.32

0.34

0.37

0.74

1.00

0.70

0.45

0.65

Question 6

0.34

0.40

0.45

0.62

0.70

1.00

0.54

0.65

Question 7

0.33

0.39

0.46

0.44

0.45

0.54

1.00

0.50

Question 8

0.37

0.34

0.33

0.56

0.65

0.65

0.50

1.00

Test–retest reliability and interrater reliability

To assess test–retest reliability, the instrument was administered to the same study group again after 1 hour. There were no differences between the results for single questions or for the total scores obtained during the 2 measurements. The correlations between the total scores from test–retest measurements were very good (Spearman rank correlation R = 0.95; P <0.001). The Polish-language CAT questionnaire was characterized by a very good repeatability, with the κ coefficient ranging from 0.76 to 0.85 (P <0.01). Details are presented in Table 4.

Table 4. Kappa coefficient for each question in 2 CAT measurements (n = 322)
QuestionMeasurement 1 Mean (SD)Measurement 2 Mean (SD)P valueaκ coefficient (95% CI)

Question 1

2.34 (1.22)

2.33 (1.69)

0.8

0.84 (0.79–0.88)

Question 2

2.29 (1.37)

2.26 (1.32)

0.7

0.77 (0.72–0.81)

Question 3

1.73 (1.42)

1.79 (1.43)

0.6

0.77 (0.72–0.82)

Question 4

3.57 (1.47)

3.62 (1.43)

0.3

0.85 (0.81–0.88)

Question 5

2.83 (1.60)

2.94 (1.58)

0.6

0.82 (0.78–0.86)

Question 6

2.08 (1.57)

2.16 (1.59)

0.5

0.82 (0.78–0.86)

Question 7

2.01 (1.59)

2.04 (1.59)

0.7

0.81 (0.76–0.85)

Question 8

2.58 (1.49)

2.63 (1.45)

0.7

0.76 (0.71–0.81)

Total

19.44 (8.54)

19.78 (1.02)

0.6

0.81 (0.77–0.85)

a Difference between 2 CAT measurements by the Wilcoxon signed-rank test

According to the Wilcoxon signed-rank test for paired variables, there were no significant differences between the results for single questions as well as for the total scores obtained during the 2 measurements. The Bland–Altman analysis also revealed very good test–retest reliability and interrater reliability, with a mean difference between the 2 measurements of –0.556 (95% CI, –0.345 to 0.767; Figure 1).

Figure 1. Repeatability of the measurement of the Polish version of the CAT questionnaire: the results of the Bland– –Altman procedure

The relation between the CAT and other clinical measures

The analysis of external validity of the CAT scores showed significant correlations between the Polish-language version of the CAT total score and mMRC dyspnea scale (R = –0.57), 6MWT (= –0.32), and some pulmonary function parameters such as FEV1 [l] (R = –0.37), FEV1 [% predicted] (R = –0.38), FVC [l] (R = –0.12), FVC [% predicted] (R = –0.31), and the ratio of FEV1 to FVC (R = –0.22) (Table 5).

Table 5. Spearman rank correlations between Polish-language version of the CAT questionnaire and modified Medical Research Council dyspnea scale, 6-minute walk test, and selected spirometry values
VariableRP value

mMRC

0.57

0.03

6MWT

–0.32

<0.001

FEV1, l

–0.37

<0.001

FEV1, % predicted

–0.38

<0.001

FVC, l

–0.12

<0.001

FVC, % predicted

–0.31

<0.001

FEV1/FVC

–0.22

<0.001

Abbreviations: see Table 1

The CAT scores were not affected by the age or sex of participants, or by their educational level.

Discussion

Our study showed that the psychometric properties of the Polish version of the CAT questionnaire are satisfactory. The validation process included the assessment of internal consistency, the relation of test items with the general test result, and test–retest reliability as the estimation of the internal stability of the test.

The CAT questionnaire has been translated into several languages, some of which have been validated. Our results are consistent with those previously reported in other populations, both from Europe16 and outside of Europe.4,17 Our study showed the expected reliability for each item, with the Cronbach α coefficient from 0.83 to 0.86 for individual items. The Cronbach α coefficient for the questionnaire was 0.87 (thus, it could be interpreted as very good) and was similar to that for the original version (α = 0.88).3 For comparison, the Cronbach α coefficient for Hindi was 0.83,4 and for Korean, 0.85.5

The test–retest reliability of the Polish version was very good, with a Spearman rank coefficient of 0.95 (P <0.001) and good repeatability also for each question. The test–retest reliability of the original version of the questionnaire was also very good, with an internal consistency correlation coefficient of 0.8.3

Unlike in our study, the validation of the original version did not include assessment of the relation between questionnaire results and other clinical parameters such as dyspnea scale or spirometry. The instruments we used in our study were the mMRC dyspnea scale, 6MWT, and selected spirometry parameters. We observed significant correlations between the total score of the Polish-language CAT questionnaire and mMRC, 6MWT, FEV1, FVC, and the ratio of FEV1 to FVC. In our study the total scores of the questionnaire correlated significantly with the mMRC dyspnea scale. A correlation between limitation of physical activity due to dyspnea and CAT results was also reported in other studies.6,18,19 The correlation analysis showed a weak correlation between dyspnea and CAT total scores. The relatively low correlation between CAT scores and mMRC scale suggests that both tools should be used as complementary measures for a more complex clinical assessment. The CAT questionnaire covers a wider spectrum of the patient’s daily functioning, and questions relating to dyspnea are only part of the assessment. Although CAT and mMRC are proposed as equivalent measures for patient classification and treatment stratification,2 the correlation between both measures reported in other studies ranges from 0.29 to 0.62.20,21

In the case of 6MWT and spirometry parameters, we observed weak but significant inverse correlations with the CAT score. This shows that in our study, airflow limitation and limited physical activity were associated with increasing CAT scores. A similar negative correlation between the 6MWT results and CAT scores was found in a Portuguese study.6 The correlations between CAT scores and spirometry were analyzed as additional observations only in selected language validation studies, but not in the validation study of the original version. The correlation between CAT results and FEV1 and FVC was assessed in the validation of the Arabic version but was reported as nonsignificant.22 On the other hand, validations of Hindi5 and Portuguese versions showed significant correlations.6 Thus, most of our results were in line with the observations from other studies.5,6,17,18

Study limitations

The fact that we did not assess the sensitivity to change in our study may constitute a potential limitation. Sensitivity to change is defined by an instrument’s responsiveness to detect the change. It requires correlating its scores with other measures reflecting any anticipated changes. The responsiveness is not required if the validation process concerns a translated version of the originally validated questionnaire. Therefore, the sensitivity to change was not assessed in our study or other validation studies, but it seems to be crucial for the assessment of questionnaire utility. Additionally, the responsiveness was not assessed in the validation study of the original CAT questionnaire.

Strengths of the study

A considerable strength of our study is a relatively large, community-based, urban­–rural population, as the Polish CAT validation study was part of the multicenter POPE study. Patients represented different ages and educational levels. Moreover, all participants were recruited based on real-life criteria of the study; thus, data could be translated into everyday practice.

In addition to good data quality, several independent statistical methods suitable for test validation were used and correlations between CAT results and clinical data were analyzed.

Significant heterogeneity of COPD patients due to differences in clinical presentation, response to therapy, as well as concomitant diseases may impact the patient’s functioning. Our study showed that age and concomitant diseases are associated with disease severity, which is in line with other studies.23,24 Thus, symptom assessment is crucial in the context of treatment and should be based on simple and reliable methods. Patient-related outcome measures should correspond to the specific clinical situation and provide an opportunity to improve the quality of care. The Polish version of the CAT questionnaire meets all necessary criteria of a validation process and could be recommended for clinical practice. So far, the CAT is one of the available tools to establish a threshold at which patients become sufficiently symptomatic to justify regular treatment. However, further research is needed to elucidate some remaining issues, including the utility of this tool for clinical assessment. Consistent use of the same validated methods for such assessments would allow us to create a more patient-oriented approach in COPD treatment.

Conclusions

The Polish-language version of the CAT questionnaire is a valid, reproducible, and reliable instrument for evaluating patients with COPD. This version of the questionnaire should be recommended for the assessment of COPD in clinical practice in Poland.