Experimental Design and Validity

Review article and use template to demonstrate each of the four experimental designs: reversal, multiple baseline, changing criterion, and alternating treatment and discuss strengths and limitations and well as all forms of validity for each experimental design.

Experimental Design and Validity

Reversal Design

From the articles in the article bank provided by your instructor, choose one that demonstrates reversal design and complete the following.

APA citation	Full APA citation here.
Strengths	1. Strength of reversal design 2. Another strength of reversal design
Limitations	1. Limitation of reversal design 2. Another limitation of reversal design
External Validity	First explain what external validity is. Then explain how external validity was present or absent with support from the article.
Internal Validity	First explain what internal validity is. Then explain how internal validity was present or absent with support from the article.
Social Validity	First explain what social validity is. Then explain how social validity was present or absent with support from the article.

Multiple Baseline Design

From the articles in the article bank provided by your instructor, choose one that demonstrates multiple baseline design and complete the following.

APA citation	Full APA citation here.
Strengths	1. Strength of multiple baseline design 2. Another strength of multiple baseline design
Limitations	1. Limitation of multiple baseline design 2. Another limitation of multiple baseline design
External Validity	First explain what external validity is. Then explain how external validity was present or absent with support from the article.
Internal Validity	First explain what internal validity is. Then explain how internal validity was present or absent with support from the article.
Social Validity	First explain what social validity is. Then explain how social validity was present or absent with support from the article.

Changing Criterion Design

From the articles in the article bank provided by your instructor, choose one that demonstrates changing criterion design and complete the following.

APA citation	Full APA citation here.
Strengths	1. Strength of changing criterion design 2. Another strength of changing criterion design
Limitations	1. Limitation of changing criterion design 2. Another limitation of changing criterion design
External Validity	First explain what external validity is. Then explain how external validity was present or absent with support from the article.
Internal Validity	First explain what internal validity is. Then explain how internal validity was present or absent with support from the article.
Social Validity	First explain what social validity is. Then explain how social validity was present or absent with support from the article.

Alternating Treatment Design

From the articles in the article bank provided by your instructor, choose one that demonstrates alternating treatment design and complete the following.

APA citation	Full APA citation here.
Strengths	1. Strength of alternating treatment design 2. Another strength of alternating treatment design
Limitations	1. Limitation of alternating treatment design 2. Another limitation of alternating treatment design
External Validity	First explain what external validity is. Then explain how external validity was present or absent with support from the article.
Internal Validity	First explain what internal validity is. Then explain how internal validity was present or absent with support from the article.
Social Validity	First explain what social validity is. Then explain how social validity was present or absent with support from the article.

RE S EARCH ART I C L E

A systematic review of social-validity assessments in the Journal of Applied Behavior Analysis: 2010–2020

Erin S. Leif | Nadine Kelenc-Gasior | Bradley S. Bloomfield | Brett Furlonger |

Russell A. Fox

Faculty of Education, Monash University, Clayton, Victoria, Australia

Correspondence Erin S. Leif, Faculty of Education, Monash University, 19 Ancora Imparo Way, Clayton VIC 3131, Australia. Email: [email protected]

Editor-in-Chief: John Borrero Handling Editor: Timothy Vollmer

Abstract We conducted a systematic review of studies published in the Journal of Applied Behavior Analysis between 2010 and 2020 to identify reports of social validity. A total of 160 studies (17.60%) published during this time included a measure of social validity. For each study, we extracted data on (a) the dimensions of social validity, (b) the methods used for collecting social-validity data, (c) the respon- dents, and (d) when social-validity data were collected. Most social-validity assessments measured the acceptability of intervention procedures and outcomes, with fewer evaluating goals. The most common method for collecting social valid- ity data was Likert-type rating scales, followed by non-Likert-type questionnaires. In most studies, the direct recipients of the intervention provided feedback on social validity. Social-validity assessment data were often collected at the conclusion of the study. We provide examples of social-validity measurement methods, discuss their strengths and limitations, and provide recommendations for improving the future collection and reporting of social-validity data.

KEYWORDS consumer satisfaction, intervention acceptability, intervention preference, social validity

Social validity is defined as a consumer’s satisfaction with the goals, procedures, and outcomes of intervention pro- grams (Wolf, 1978). Social-validity assessments of behavior-analytic interventions provide participants and relevant stakeholders with the opportunity to give feed- back and express their satisfaction with these three dimensions (Wolf, 1978). These assessments may also allow individuals to express their preferences for interven- tions, which might enhance participation and outcomes (Hanley, 2010). One of the criticisms, however, of pub- lished research on behavior-analytic interventions has been the lack of social-validity measurement, as studies have instead predominantly focused on the efficacy and effectiveness of interventions and practices (Callahan et al., 2017; Carr et al., 1999; Ferguson et al., 2019; Huntington et al., 2023). There have been recent calls to improve the collection and reporting of information about the degree to which the direct recipients of behavior- analytic interventions view the procedures used as part of

these interventions as acceptable and preferred and the outcomes meaningful (Common & Lane, 2017).

Wolf (1978) noted that the construct of social valid- ity consists of three dimensions: (a) the goals of the intervention, or what behaviors the intervention is intended to change; (b) the procedures used during interven- tion; and (c) the degree to which intervention effects are meaningful and desirable, including those intended and unpredicted. This conceptualization has been the primary guide for the development of social-validity assessment methods in the behavior-analytic research literature. Social validity may be a critical variable in addressing the research- to-practice gap, as interventions deemed impractical, unacceptable, or harmful may not be adopted or applied in real-world settings (Kazdin, 1977; Kern & Manz, 2004; Leko, 2014; Lloyd & Heubusch, 1996). Assessing the social validity of behavior-analytic interventions may also support the sustainable implementation of evidence-based interventions at a larger scale (Cook et al., 2013; Reimers

Received: 4 October 2023 Accepted: 13 May 2024

DOI: 10.1002/jaba.1092

This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. © 2024 The Author(s). Journal of Applied Behavior Analysis published by Wiley Periodicals LLC on behalf of Society for the Experimental Analysis of Behavior (SEAB).

542 J Appl Behav Anal. 2024;57:542–559.wileyonlinelibrary.com/journal/jaba

https://orcid.org/0000-0003-2219-2405

https://orcid.org/0000-0002-5792-5480

https://orcid.org/0000-0002-3061-3495

mailto:[email protected]

http://creativecommons.org/licenses/by/4.0/

http://wileyonlinelibrary.com/journal/jaba

http://crossmark.crossref.org/dialog/?doi=10.1002%2Fjaba.1092&domain=pdf&date_stamp=2024-06-07

et al., 1987) and prevent the development and distribution of interventions that are likely to be rejected by consumers and the public (Schwartz & Baer, 1991).

Carr et al. (1999) reviewed research published in the Journal of Applied Behavior Analysis (JABA) from 1968 to 1998 to identify the prevalence of social-validity mea- sures. Two dimensions of social validity were assessed for each study, intervention acceptability and intervention outcomes. On average, during this 31-year period, mea- sures of social validity related to intervention acceptabil- ity and outcomes were reported in only 13% of published studies. Carr et al. expressed concerns that failure to report the outcomes of social-validity assessments may prevent researchers and practitioners from identifying the reasons that behavior-analytic interventions may be rejected or discontinued by consumers. Additionally, Carr et al. noted that failure to report the methods used to gather social-validity data from various consumers may prevent the development, refinement, and uptake of these methods.

The methods used by Carr et al. (1999) were replicated and extended by Ferguson et al. (2019) who identified the prevalence and type of social-validity assessments published in JABA between 1999 and 2016. Across this 17-year period, only 12% of studies included a social-validity mea- sure. The social validity of the intervention procedures and outcomes were more likely to be reported than the social validity of intervention goals. The authors noted that most studies used a combination of rating scales, questionnaires, and intervention choice to collect social-validity data. The authors also reported that “other” forms of social-validity measurement were used in 8% of studies, but they did not provide examples of what these types of measurement involved.

Other researchers have explored the prevalence and type of social-validity assessment data published across a range of journals. Snodgrass et al. (2018) systematically reviewed reports of social validity published in six special education journals. All single-case research design studies published in these six journals between 2005 and 2018 were reviewed, with 26.8% (n = 115) reporting results of a social-validity assessment. Of these 115 studies, 28 measured the social validity of the goals, procedures, and outcomes of the intervention. For these 28 studies, ques- tionnaires were the most common method for collecting data (n = 20), the direct recipients of the intervention most often provided data on social validity (n = 19), and most social-validity assessments were administered at or after the intervention concluded (n = 27). However, one limitation of Snodgrass et al. was that the authors limited their assessment of the methods, respondents, and times to only those 28 studies that measured all three dimen- sions of social validity. Additionally, the authors did not include JABA in their sample of journals.

Most recently, Huntington et al. (2023) assessed social validity across eight behavior-analytic journals between 2010 and 2020, including JABA. Huntington et al. found

47% of studies included in their review reported a measure of social validity, with a large increase evident in 2019 and 2020. The authors highlighted the need for future research to identify and describe methods used to collect social- validity data, the participants who provide social-validity data, and timing of social-validity assessments in behavior- analytic journals. The collection and reporting of these data might provide a clearer picture of how social validity has been measured in studies published in JABA, assist in the evaluation of the quality of the data collected, and provide new insights into how to potentially improve the future assessment of social validity. To this end, our purpose was to systematically identify and appraise social-validity assess- ments included in studies published in JABA between 2010 and 2020. For the studies included in this review, we sought to identify (a) the dimensions of social validity assessed, (b) the types of methods used to collect social-validity data, (c) the individuals who provided social-validity data (the respondents), and (d) the point at which social-validity assessments were conducted. We provide illustrative exam- ples of different ways to measure social validity and discuss the strengths and potential limitations of different social- validity assessments. Based on these data and examples, we provide recommendations for potentially improving the col- lection and reporting of social-validity data in behavior- analytic research.

METHOD

A systematic literature review was undertaken to iden- tify studies for inclusion in this report. Figure 1 includes a diagram of the study screening process. Rather than conducting a keyword search of terms related to social validity in various databases, the iden- tification of relevant peer-reviewed studies for inclu- sion in this review was undertaken by compiling and systematically screening all studies published in JABA from 2010 (Volume 43[1]) to 2020 (Volume 53[4]). All studies were downloaded directly from the journal’s website and independently reviewed. A total of 1,059 studies was published in JABA between 2010 and 2020. The search focused on studies published from 2010 onward to allow us to systematically replicate and extend the procedures described by Carr et al. (1999) and Ferguson et al. (2019) within a more recent 10-year period. Additionally, as the purpose of the cur- rent review was to provide a more in-depth analysis of the characteristics of social-validity assessments pub- lished in JABA, studies published in other journals were not included in the analysis.

Initial study screening procedure

To be included in the current review, the study needed to include at least one human or nonhuman participant.

SYSTEMATIC REVIEW OF SOCIAL VALIDITY 543

19383703, 2024, 3, D ow

nloaded from https://onlinelibrary.w

iley.com /doi/10.1002/jaba.1092 by C

apella U niversity, W

iley O nline L

ibrary on [20/09/2024]. See the T erm

s and C onditions (https://onlinelibrary.w

iley.com /term

s-and-conditions) on W iley O

nline L ibrary for rules of use; O

A articles are governed by the applicable C

reative C om

m ons L

icense

The following were excluded during the initial screening process: technical reports, systematic reviews, meta- analyses, brief reviews, book reviews, errata, announce- ments, surveys, issue information, acknowledgments, and reanalyses of previously published data sets. The methods and results sections of all 1,059 studies were examined to determine which studies fulfilled this inclusion criterion. This resulted in the exclusion of 177 studies that did not include at least one human or nonhuman participant.

Inclusion and exclusion criteria

The remaining 882 studies were reviewed a second time for the presence or absence of at least one measure of social validity. First, the following terms were typed into the elec- tronic search bar of the downloaded PDF version of each study: social validity, social validation, social acceptability, intervention validity, intervention acceptability, consumer satisfaction, satisfaction survey, interview, preference, or choice. If this search returned a result, the study was reviewed to locate any social-validity measure. If this search did not yield any results, the methods, results, and discus- sion section of the study were reviewed in full to determine whether a social-validity measure was included. If the study did not include a measure of social validity, it was excluded.

A study was included if it reported any qualitative or quantitative data measuring the social significance of the intervention goals, procedures, or outcomes (Wolf, 1978) or if it included a measure of intervention preference (Hanley, 2010). All studies that included one or more measures of social validity and reported the outcomes of the assessment were retained. Of the 882 reviewed studies, 160 studies reported one or more measures of social validity.

Dependent measures

Data were extracted for each of the 160 studies that included a measure of social validity for the following categories (and category variables): (a) the authors, (b) the year of publication, (c) the dimension of social validity measured (goals, procedures, or outcomes), (d) the specific method that was used to collect social-validity data (e.g., Likert-type rating scales, questionnaires, or inter- views), (e) the person who provided the social-validity data (e.g., parents, teachers, or participants), and (f) the specific point(s) at which the social-validity data were collected (e.g., before, during, or after intervention). The data col- lected as part of this study can be found in the Additional Supporting Information in the online version of this article at the publisher’s website.

F I GURE 1 Flow diagram of the study screening process.

544 LEIF ET AL.

19383703, 2024, 3, D ow

nloaded from https://onlinelibrary.w

iley.com /doi/10.1002/jaba.1092 by C

apella U niversity, W

iley O nline L

ibrary on [20/09/2024]. See the T erm

s and C onditions (https://onlinelibrary.w

iley.com /term

s-and-conditions) on W iley O

nline L ibrary for rules of use; O

A articles are governed by the applicable C

reative C om

m ons L

icense

Dimensions of social validity

Table 1 provides a definition of each dimension of social validity assessed in the current review. A study was scored as reporting a measure of the social validity of the interven- tion goals if formal measures were employed to assess consumer acceptance of or agreement with the purpose or purported goals of the intervention and the behaviors targeted for change as part of the intervention. A study was scored as reporting a measure of the social validity of the intervention procedures if formal measures were employed to assess consumer acceptance of, agreement with, or preference for the tactics used to deliver the inter- vention or to assess the consumer’s willingness to continue with intervention. A study was scored as reporting an assess- ment of the social validity of the intervention outcomes if

formal measures were used to assess consumer satisfaction with, social importance of, or practical significance of the intervention effects.

Social-validity assessment methods

Table 2 provides a definition of each method of social- validity assessment included in the current review. Social- validity assessment methods were defined as the specific procedures used to collect data on measures of each dimen- sion of social validity. Social-validity assessment methods included (a) Likert-type rating scales, (b) non-Likert-type questionnaires, (c) direct observations, (d) intervention pref- erence or choice questions, (e) concurrent-chains interven- tion preference assessments, or (f) interviews.

TABLE 1 Dimensions of social validity assessed (adapted from Wolf, 1978).

Dimension Definition Total number of studies Percentage

Intervention goals Acceptance of or agreement with the purpose or purported goals of the intervention and the behaviors targeted for change (Are the specific behaviors selected for change and the reasons for behavior change important and valued?)

26 16.25%

Intervention procedures Acceptance of, agreement with, or preference for the strategies and tactics used to deliver the intervention or willingness to continue with intervention (Are the specific intervention strategies used acceptable and preferred?)

144 90%

Intervention outcome Satisfaction with, social importance of, or practical significance of the intervention effects (Are the outcomes associated with the intervention meaningful, including any unexpected outcomes?)

110 68.75%

TABLE 2 Social-validity assessment methods (adapted from Carter & Wheeler, 2019).

Methods Definition Total number of studies Percentage

Likert-type rating scales

A scale that consists of a series of statements or items related to the goals of an intervention, intervention procedures, or outcomes of an intervention for which respondents are asked to indicate their level of agreement or disagreement with each statement. The scale typically ranges from “Strongly Disagree” to “Strongly Agree,” with several intermediate response options

129 80.63%

Non-Likert-type questionnaires

A survey or assessment tool that does not use the traditional Likert-type scale format for collecting responses. Questionnaires might include closed-ended response options, including multiple-choice or yes/no questions; visual- analogue scales; or open-ended questions about the intervention

53 33.13%

Direct observations In vivo or video-based observations in which observers watch intervention sessions and then provide feedback on the intervention, often using Likert- type rating scales or non-Likert-type questionnaires

41 25.63%

Intervention preference or choice

Opportunities for people who are directly involved in the intervention (as recipients or interventionists) to provide feedback on which intervention they prefer or will continue to use following the study. However, the respondent does not experience the intervention after indicating their preference or choice

17 10.63%

Concurrent-chains intervention preference assessments

Opportunities for people who are directly involved in the intervention (as recipients) to choose from available interventions by selecting a discriminative stimulus associated with that intervention and then experiencing their selected intervention following their selection

15 9.38%

Interviews A conversation facilitated by an interviewer who asks the respondent a range of questions to collect information about their opinion of, satisfaction with, or preference for the interventions’ goals, procedures, and outcomes

5 3.13%

SYSTEMATIC REVIEW OF SOCIAL VALIDITY 545

19383703, 2024, 3, D ow

nloaded from https://onlinelibrary.w

iley.com /doi/10.1002/jaba.1092 by C

apella U niversity, W

iley O nline L

ibrary on [20/09/2024]. See the T erm

s and C onditions (https://onlinelibrary.w

iley.com /term

s-and-conditions) on W iley O

nline L ibrary for rules of use; O

A articles are governed by the applicable C

reative C om

m ons L

icense

Respondents

Table 3 provides a definition of each group of social- validity assessment respondents included in the current review. Respondents were defined as any person who was formally invited by the researchers to participate in a social-validity assessment and included (a) participants who received the intervention; (b) participants who deliv- ered the intervention; (c) parents or caregivers of the par- ticipants who received the intervention but who did not deliver the intervention; (d) educators, therapists, instruc- tors, or other professionals who had a relationship with the participants who received the intervention but who did not deliver the intervention; and (e) individuals who were not involved in the study and who did not have a relationship with the participants but were invited by the researchers to provide feedback on the study’s goals, procedures, or outcomes.

Social-validity measurement points

Table 4 provides a definition of the different points at which social-validity assessments were conducted in the

included studies. If social-validity data were collected prior to the start of the intervention (e.g., by asking parents about the acceptability of the intervention goals), it was coded as “before.” If social-validity data were collected during the implementation of the intervention (e.g., by providing participants with a choice of which intervention they would like to experience), it was coded as “during.” If social-validity data were collected at the conclusion of intervention, during maintenance or generalization phases, or during follow-up sessions, it was coded as “after.” If there was not enough information provided in the methods section of the study to determine the point at which social- validity data were collected, it was coded as “unclear.” If social validity was assessed at more than one point (e.g., before and after the study), it was coded as both “before” and “after.” If social validity was assessed multi- ple times at a single point (e.g., assessed three times after the study), it was coded as “after” only one time.

Data extraction procedures

To extract data, the first author read the methods and results section for each included study. If tables were

TABLE 3 Social-validity respondents.

Respondents Definition Total number of studies Percentage

Participants who received the intervention

Consumers whose behavior was targeted for change through the delivery of the intervention (e.g., children, students, athletes, employees)

89 55.63%

Participants who delivered the intervention

Consumers who delivered the intervention but who were not members of the research team (e.g., parents, teachers, coaches, therapists)

41 25.63%

Parents/caregivers Family members or primary caregivers for participants who received the intervention but who were not involved in the delivery of the intervention

27 16.88%

Educators/therapists/ instructors

Professionals who had a relationship with the participants who received the intervention but who were not involved in the delivery of the intervention

25 15.63%

Individuals who were not involved in the study

Any individual who served as a respondent and provided social-validity data but who did not have a relationship with the participant who received the intervention and/or who was naïve to the purpose of the study

35 21.88%

TABLE 4 Point at which social-validity data were collected.

Measurement point Definition Total number of studies Percentage

Before Prior to the start of the intervention 15 9.38%

During Any time during the delivery of the intervention, or when intervention sessions followed the collection of social-validity data and were informed by the social- validity data

23 13.75%

After After the conclusion of the intervention when no additional intervention sessions were planned or delivered, based on the data, or during maintenance and generalization or follow-up sessions

133 83.13%

Unclear The information provided in the methods section of the study were not detailed enough to permit the identification of the point at which social-validity data were collected

13 8.13%

546 LEIF ET AL.

19383703, 2024, 3, D ow

nloaded from https://onlinelibrary.w

iley.com /doi/10.1002/jaba.1092 by C

apella U niversity, W

iley O nline L

ibrary on [20/09/2024]. See the T erm

s and C onditions (https://onlinelibrary.w

iley.com /term

s-and-conditions) on W iley O

nline L ibrary for rules of use; O

A articles are governed by the applicable C

reative C om

m ons L

icense

presented that included a list of specific questions asked as part of the social-validity assessment, these were reviewed as well. In some cases, authors provided an example of social-validity data collection tools as part of supplementary materials information. When provided, supplementary materials were also reviewed. The pres- ence or absence of each category variable was determined by the presence of keywords in the text, tables, and/or supplementary materials and the description of the dimensions of social validity measured, methods used to measure social validity, the respondents who provided social-validity data, and the point(s) at which social- validity data were collected as evident in the text of the study. All data were entered into an author-created Excel spreadsheet to facilitate data analysis (available upon request). The percentage of studies that included each variable was calculated by dividing the total number of studies that measured each dimension by the total num- ber of included studies (n = 160) and multiplying by 100.

Interrater reliability

Interrater reliability data were collected at four points. First, interrater reliability data were collected for the total number of studies published in JABA from 2010 to 2020. Two independent raters (the third and fifth authors) reviewed all studies in all issues published in four randomly selected years of publication (2011, 2015, 2016, and 2020; 44% of total studies). Years were selected at random using an online random number generator. An agreement was defined as the primary and indepen- dent rater calculating the same total number of studies included in each issue. A disagreement was defined as any discrepancy in the total number of studies per issue. Interrater reliability was calculated for each study by adding the total agreements and dividing by the sum of the agreements plus disagreements and multiplying by 100. Interrater reliability for the number of total studies published in JABA was 100%.

Second, interrater reliability data were collected for the initial screening procedure (N = 1,059). The two independent raters reviewed all studies in all issues pub- lished in the same four randomly selected years of publi- cation (2011, 2015, 2016, and 2020; 44% of total studies). An agreement was defined as the primary and indepen- dent rater calculating the same total number of studies that included at least one human or nonhuman partici- pant for each issue in each year. A disagreement was defined as any discrepancy in the total number of studies identified as including at least one human or nonhuman participant per issue per year. Interrater reliability was calculated for each study by adding the total agreements and dividing by the sum of the agreements plus disagree- ments and multiplying by 100. Total agreement was cal- culated by averaging the interrater reliability score across years. Interrater reliability for the number of total studies

included following the initial screening process was 96.50%. Any discrepancies (n = 16) were reviewed by the first author and one of the independent raters and resolved.

Third, the two independent raters applied the inclu- sion and exclusion criteria to the studies retained follow- ing initial screening (n = 882). The independent raters reviewed all studies in the same four randomly selected years of publication to determine whether the study included a measure of social validity. If the study included a measure of social validity, the independent raters recorded the authors, title, year, and issue in an Excel workbook that was identical to that used by the primary rater. An agreement was defined as the primary and independent rater selecting the same authors, title, year, and issue. A disagreement was defined as any dis- crepancy between the studies identified by the two raters. Interrater reliability was calculated for each year by add- ing the total agreements and dividing by the sum of the agreements plus disagreements for that year and multi- plying by 100. Total agreement was calculated by averag- ing the interrater reliability score across years. Total interrater reliability for the inclusion procedures was 96%. Any discrepancies (n = 4) were reviewed by the first author and one of the independent raters and resolved.

Finally, the second author independently reviewed and coded 84.38% (n = 135) of the included studies. The inde- pendent rater followed the same coding procedures described above. Data entered in the Excel workbook by the primary rater were then compared with those entered by the independent rater. An agreement was defined as the pri- mary and independent rater indicating the presence or absence of each category variable (dimension, method, respondent, and point of collection). A disagreement was defined as any discrepancy between the coding of each cate- gory variable between the two raters. Interrater reliability was calculated individually for each study by adding the total agreements and dividing by the sum of the agreements plus disagreements and multiplying by 100. Total agreement was calculated by averaging the interrater reliability score across studies and averaged 95.60% (range: 92%–100%). Any discrepancies (n = 6) were reviewed by the first author and one of the independent raters and resolved.

RESULTS AND DISCUSSION

Prevalence of social validity

Figure 2 depicts the percentage of total studies published per year that included a measure of social validity. Of the 882 studies retained for review, 160 (18.14%) included measures of social validity. Between 2010 and 2019, the total number of studies published in JABA each year (and retained for inclusion in the current review) ranged from 57 (in 2017) to 99 (in 2011). The percentage of these studies reporting results of social-validity data was stable,

SYSTEMATIC REVIEW OF SOCIAL VALIDITY 547

19383703, 2024, 3, D ow

nloaded from https://onlinelibrary.w

iley.com /doi/10.1002/jaba.1092 by C

apella U niversity, W

iley O nline L

ibrary on [20/09/2024]. See the T erm

s and C onditions (https://onlinelibrary.w

iley.com /term

s-and-conditions) on W iley O

nline L ibrary for rules of use; O

A articles are governed by the applicable C

reative C om

m ons L

icense

ranging between 10% and 20%. A notable exception was observed in 2017, when 28% of included studies included a measure of social validity. Interestingly, a large increase in the percentage of studies including a measure of social validity was observed in 2020. In 2020, 137 studies were published and included for review in the current study, and of these, 32.10% included a measure of social validity.

These findings extend those presented by Carr et al. (1999) and Ferguson et al. (2019). Between 1968 and 1998, Carr et al. identified an increasing trend in the number of studies published in JABA, particularly between the mid- 1970s and the mid-1980s. Between the mid-1980s and 1998, Carr et al. reported that approximately 25% of studies included a measure of social validity. Between 1999 and 2016, Ferguson et al. identified 1,209 studies that included at least one participant. Of these studies, only 141 (12%) included a measure of social validity, a notable decrease rel- ative to the findings of Carr et al. However, Ferguson et al. noted a variable but increasing trend in the percentage of studies including a measure of social validity, primarily between 2005 and 2016.

In the current study, we found that between 2010 and 2020, on average, 18.14% of studies published in JABA that included at least one participant included a measure of social validity. These data suggest that publication of social-validity assessment data in JABA is increasing. As mentioned above, we found a marked increase in the publication of social-validity assessment data in 2020, with 32.10% of studies including a measure of social validity. These findings replicate those reported by Hun- tington et al. (2023), who also showed a substantial

increase in the number of studies including a measure of social validity in behavior-analytic journals in 2019 and 2020.

Dimensions of social validity

Table 1 depicts the number and percentage of included studies (n = 160) reporting a measure of each dimension of social validity. Assessing consumer acceptance of the procedures used as part of the intervention was the most common dimension of social validity measured, with 90% of included studies reporting a measure of acceptability or satisfaction with procedures used. A measure of consumer satisfaction with intervention outcomes was included in 68.75% of included studies, whereas measures of consumer acceptance or agreement with the goals of intervention were reported less often, in only 16.25% of included studies.

These findings differ from those reported by Carr et al. (1999), who identified between 0% and 30% of studies as including a measure of the acceptability of the procedures used. This value increased to a high of nearly 50% when data on the percentage of studies including a measure of the acceptability of the procedures or perceptions of inter- vention outcomes were also included. However, Carr et al. did not include data on the percentage of studies including a measure of the social validity of intervention goals. Although the findings of the current study differ from those of Carr et al., they are consistent with those reported by Ferguson et al. (2019), who found that 85% of studies included a measure of the social validity of the interven- tion procedures, 60% included a measure of the social

F I GURE 2 Total included studies and percentage of studies reporting social-validity data in the Journal of Applied Behavior Analysis from 2010 to 2020.

548 LEIF ET AL.

19383703, 2024, 3, D ow

nloaded from https://onlinelibrary.w

iley.com /doi/10.1002/jaba.1092 by C

apella U niversity, W

iley O nline L

ibrary on [20/09/2024]. See the T erm

s and C onditions (https://onlinelibrary.w

iley.com /term

s-and-conditions) on W iley O

nline L ibrary for rules of use; O

A articles are governed by the applicable C

reative C om

m ons L

icense

validity of the outcomes, and only 12% included a measure of the social validity of the intervention goals.

In the current study, we found that the acceptability of intervention goals was often assessed concurrently with the acceptability of intervention procedures using Likert-type rating scales. However, some authors conducted observa- tions of behavior prior to implementing any interventions to determine the overall goals for the intervention. In one noteworthy example, Mann and Karsten (2020) asked col- lege students to model different types of typical conversa- tion behaviors and recorded data on the topography of these behaviors. These behaviors were then used as a nor- mative sample to develop socially valid intervention goals for participants. Because procedures designed to assess the social validity of intervention goals are published less fre- quently, it is possible that behavior analysts are less familiar with how to design these types of social-validity assess- ments. Alternatively, it is possible that behavior analysts develop individualized and socially valid goals and proce- dures for intervention through conversations with partici- pants and other stakeholders prior to intervention during the process of gaining informed consent. However, we found that information about these types of informal mea- sures of social validity were not commonly published.

Social-validity methods

Table 2 depicts the number and percentage of studies that used various methods to assess social validity.

Likert-type rating scales

Likert-type rating scales were the most frequently used method, accounting for 80.63% of the total number of studies. In these studies, researchers developed a set of statements and asked respondents to select from a set of response options to indicate how much they agreed with the statement. For example, DiGennaro Reed et al. (2010) evaluated a video-modeling intervention to improve the procedural fidelity of behavioral interven- tions delivered by teachers. At the conclusion of the study, the teachers were invited to respond to 15 Likert- type questions adapted from the Intervention Rating Profile-15 (Martens et al., 1985) to indicate the accept- ability of the video-modeling intervention. The teachers read each statement before selecting a response option on a Likert-type scale ranging from 1 (strongly disagree) to 6 (strongly agree), with higher scores representing higher intervention acceptability.

Other authors have used Likert-type rating scales to assess all three dimensions of social validity. Austin and Bevan (2011) evaluated the effects of a differential-rein- forcement-of-low-rates-of-behavior intervention on the rate of attention-seeking behavior displayed by three stu- dents. At the end of the study, the researchers invited the

teacher to respond to questions about whether students asked for attention too often prior to intervention (assess- ment of the social validity of intervention goals), whether the intervention was easy to implement, whether it could be easily integrated into classroom routines, whether she would continue to use it (assessment of the social validity of intervention procedures), and whether she thought the children worked more independently and completed more work when the intervention was in place (assessment of the social validity of interven- tion outcomes). Data were collected using a 5-point Likert-type scale (strongly disagree to strongly agree), with higher scores indicating higher levels of accept- ability. The authors also adapted the Likert-type rating scale to collect social-validity data with students. The students were invited to indicate whether they liked the intervention, liked earning points exchanged for reinforcers, and wanted their teacher to keep using the intervention. Students circled faces on a 3-point smiling-faces scale for each question.

Likert-type rating scales are a type of closed-ended social-validity assessment in that they allow participants to select a single response that represents their answer to a question or agreement with a statement. Likert-type social-validity assessments may be relatively easy and fast to implement and may allow for the quantitative analysis of social-validity data and comparison of data across multiple respondents. For example, mean ratings for each participant can be compared across participants or for the same participant over time. A unique example of a pre- and postintervention measure of social validity was provided by Mancuso and Miltenberger (2016), who assessed participants’ perceptions of their public speaking skills before and after a habit reversal intervention. Par- ticipants rated their confidence and comfort with public speaking before and after intervention, and mean scores were compared to determine whether participants had more positive views of their public speaking postinterven- tion. These data supplemented direct observations of the participants’ public speaking skills. However, a notewor- thy limitation of Likert-type rating scales is that they do not allow respondents to expand on the reasons for their response selections and thus may not help researchers identify why interventions may or may not be viewed as acceptable or preferred by consumers.

Non-Likert-type questionnaires

Non-Likert-type questionnaires were the second most common method of assessing social validity and were used in in 33.13% of included studies. For example, Raiff and Dallery (2010) invited participants to complete a treatment- acceptability questionnaire, using a 100-mm visual analogue scale, to rate the ease of use, enjoyment, convenience, help- fulness, and effectiveness of an Internet-based contingency- management program for the management of Type

SYSTEMATIC REVIEW OF SOCIAL VALIDITY 549

19383703, 2024, 3, D ow

nloaded from https://onlinelibrary.w

iley.com /doi/10.1002/jaba.1092 by C

apella U niversity, W

iley O nline L

ibrary on [20/09/2024]. See the T erm

s and C onditions (https://onlinelibrary.w

iley.com /term

s-and-conditions) on W iley O

nline L ibrary for rules of use; O

A articles are governed by the applicable C

reative C om

m ons L

icense

1 diabetes. Higher numbers on the visual analogue scale were indicative of more favorable perceptions of the inter- vention. Jones et al. (2019) developed open-ended questions to assess participant perceptions of the acceptability and outcomes of an interdependent group contingency imple- mented in a classroom setting to reduce students’ use of cell phones during instructional periods. Students who participated in the intervention were invited to answer three questions following the intervention: (1) What did you think about not having your phones during class time? (2) Did you feel that you could focus better during class without your phones? and (3) What was your reaction when other students caused the rest of the class to lose their 10 min of free time? Interestingly, the students who partici- pated conveyed an unfavorable view of the interdependent group-contingency procedures because they were discour- aged from using their cell phones at school (a measure of the social validity of intervention acceptability). However, these same participants reported that they were satisfied with intervention outcomes because they were better able to sustain their focus in the classroom (a measure of the social validity of the intervention outcome). The use of open-ended questions allowed researchers to gain more information about participants’ opinions, which may be helpful in inter- preting and understanding the reason for discrepant or unfa- vorable ratings on closed-ended social-validity assessments.

Direct observations

Direct observations were the third most common method used by researchers to gather social-validity data. Of the total number of studies reporting a measure of social valid- ity, 25.63% included a direct observation measure. For example, before initiating an intervention to improve the safety skills of employees working in a manufacturing setting, Abellon and Wilder (2014) collected data on the workplace behavior displayed by one employee whom the supervisor identified as displaying exemplary safety skills. These data were used to establish socially valid intervention goals (i.e., a performance standard) for the participating employees. Similarly, Stokes et al. (2010) evaluated an inter- vention to improve the pass-blocking skills of high school American-rules football players. Prior to intervention, the researchers watched video clips of the top-performing players from the previous year and measured their correct performance using a 10-step task analysis. The researchers used data collected from these videos to establish perfor- mance goals for players receiving the intervention. In both studies, the behaviors displayed by participants during base- line and intervention were compared with these normative samples to determine how much improvement was made and when performance goals were achieved.

In some studies that used direct observation measures, observers were asked to watch video clips of different interventions and then rate the acceptability of the proce- dures used. For example, Gibbs et al. (2018) asked the

parents of children who received an intervention to reduce vocal stereotypy to watch videos of two different interven- tions: response interruption and redirection (RIRD) alone or free access to competing stimuli + RIRD. Using an adapted version of the Intervention Evaluation Inventory– Short Form (TEI-SF; Kelley et al., 1989), parents were asked to respond to statements about the acceptability of the procedures used in each condition via a Likert-type rating scale (1 = strongly disagree; 5 = strongly agree). Example statements included “I find this intervention to be an acceptable way of dealing with my child’s vocal ste- reotypy” and “I would be willing to use this intervention at home to address my child’s vocal stereotypy.”

In other studies, naïve observers were asked to view video samples of participants pre- and postintervention to judge whether the outcomes were meaningful or consistent with behavioral norms or performance standards. For example, Grosberg and Charlop (2017) asked 20 mothers of school- aged children who were unfamiliar with both the purpose of the study and the participants to view video clips of children collected during baseline and intervention and answer ques- tions about the children’s play and social skills. After watch- ing each clip, the mothers responded to questions such as “Does the child demonstrate an interest in having a conver- sation with his/her peers?” and “Would my child want to talk with this peer?” using a 7-point Likert-type rating scale (with 1 being strongly disagree, 4 being neutral, and 7 being strongly agree). The researchers displayed differences between ratings of participants’ pre- and postintervention play and social behavior, which were also compared through paired samples t tests conducted with numerical data col- lected from these questionnaire items. This method of data analysis allowed the researchers to determine the statistical significance (as well as practical significance) of changes in ratings related to the behavior demonstrated by children before and after intervention.

As described above, direct observation measures of social validity may be useful for developing intervention goals, assessing the acceptability of intervention proce- dures with consumers who are not directly receiving or involved in the delivery of interventions, or assessing the degree to which behavior change is meaningful or consis- tent with behavioral norms (or expected behaviors based on normative samples or comparisons). Direct observa- tions were often used in conjunction with Likert-type rat- ing scales to measure observers’ agreement with statements about the acceptability of the intervention procedures used or the relative degree of behavior change. However, the degree to which independent observers rate intervention procedures as acceptable or the degree to which behavior change is consistent with behavioral norms may not necessarily reflect the degree to which the direct recipient of the intervention perceives the procedures to be acceptable and the outcomes mean- ingful. Rather, such measures more often reflected the degree to which others view the procedures as acceptable and the outcomes meaningful.

550 LEIF ET AL.

19383703, 2024, 3, D ow

nloaded from https://onlinelibrary.w

iley.com /doi/10.1002/jaba.1092 by C

apella U niversity, W

iley O nline L

ibrary on [20/09/2024]. See the T erm

s and C onditions (https://onlinelibrary.w

iley.com /term

s-and-conditions) on W iley O

nline L ibrary for rules of use; O

A articles are governed by the applicable C

reative C om

m ons L

icense

Intervention preference or choice

Intervention preference or choice questions were the fourth most common method used to gather social- validity data, reported in 10.63% of included studies. In these studies, direct consumers of the intervention were invited to indicate their preference for different interven- tion components or choose the intervention they would like to continue with at the conclusion of the study. Nota- bly, respondents did not experience the intervention after indicating their preference or choice. For example, following an intervention to address food selectivity dis- played by a child with autism, Allison et al. (2012) asked the child’s parent, who did not deliver the intervention, to indicate her preference for two equally effective inter- vention procedures: escape extinction + differential rein- forcement of alternative behavior and escape extinction + noncontingent reinforcement. The authors noted that because both interventions were effective, parent prefer- ence might be the most important determinant of which intervention to use. The parent reported that escape extinction + noncontingent reinforcement was more acceptable, easier to implement, and a better fit for her child’s needs (measures of the social validity of the inter- vention procedures). The parent also indicated that she would feel more comfortable implementing escape extinc- tion + noncontingent reinforcement at home and in pub- lic settings. Intervention preference or choice assessments might address limitations associated with direct observa- tional methods because they involve the direct recipients of intervention or those who are responsible for imple- menting intervention outside of the study. Combining measures of intervention preference with open-ended questions about why the specific intervention is preferred may provide researchers with rich information about components of intervention that are viewed as more or less acceptable as well as components of interventions that might continue to be implemented postintervention.

Concurrent-chains intervention preference assessments

Likert-type rating scales, questionnaires, and other methods of collecting social-validity data might not be accessible to people with disabilities or young children who cannot vocally report their preferences. In such cases, concurrent-chains assessments might be used to assess relative preference for different interventions. Concurrent-chains assessments were reported in 9.38% of included studies. In concurrent-chains assessments, participants choose between two or more concurrent interventions. Response options, each associated with a discriminative stimulus (e.g., a colored card), are pre- sented to the participant. Following a selection response (e.g., pointing to a colored card), the participant experi- ences the intervention associated with that response

option. For example, Potter et al. (2013) evaluated pref- erence for interventions designed to increase leisure item engagement and decrease motor stereotypy with teen- agers with developmental disabilities and language delays. Participants were invited to select a colored card corresponding to each intervention. When differential consequences were provided (i.e., the participant experi- enced the intervention associated with the colored card), preferences were identified. All participants consistently selected the colored card associated with response block- ing plus differential access to automatic reinforcement. Although concurrent-chains assessments will likely provide valuable information about the preferences of individuals with disabilities, the approach may require teaching prerequisite skills (e.g., discrimination between interventions). Leaf et al. (2010) evaluated preferences for different prompting procedures with young children with autism and language delays. The children in this study made inconsistent selections, suggesting they either did not have clear preferences or could not discriminate the interventions associated with each colored card.

Interviews

Interviews were the least common assessment method across all studies. They were used to gather social-validity data in 3.13% of included studies. This finding likely reflects the fact that JABA favors the publishing of quan- titative rather than qualitative data. Interview data are often analyzed using qualitative research methods, such as thematic analysis (Braun & Clarke, 2022), a research method that may be less familiar to behavior analysts. However, there were some noteworthy examples. For example, Gunning et al. (2020) taught parents of typi- cally developing children and children with autism to implement a version of the Preschool Life Skills program with their children at home. At the end of the study, the authors interviewed the children to find out what they thought of the program. The authors reported activities included in the program that the children said they liked (e.g., marble runs, foam building kit). In another exam- ple, Nieto and Wiskow (2020) interviewed students fol- lowing their participation in the STEP it UP! game to determine which condition (i.e., no game, Step it UP! game, or Step it UP! game + adult interaction) they liked the most and why. In these studies, the researchers posed brief open-ended questions to participants and recorded their responses. The questions asked were similar to open-ended survey questions, with the main difference being that the researcher asked the questions instead of asking respondents to write down their answers. In most studies that included interviews, short illustrative quotes were provided in the results section or the authors summa- rized the main findings in one or two sentences.

Overall, these findings add to the literature on the assessment of social validity by defining different methods,

SYSTEMATIC REVIEW OF SOCIAL VALIDITY 551

19383703, 2024, 3, D ow

nloaded from https://onlinelibrary.w

iley.com /doi/10.1002/jaba.1092 by C

apella U niversity, W

iley O nline L

ibrary on [20/09/2024]. See the T erm

s and C onditions (https://onlinelibrary.w

iley.com /term

s-and-conditions) on W iley O

nline L ibrary for rules of use; O

A articles are governed by the applicable C

reative C om

m ons L

icense

reporting data on the prevalence of different methods, and providing examples of how different methods might be used to facilitate the collection of social-validity data. In addition, we highlighted some noteworthy strengths and limitations of these methods. Of note, Carr et al. (1999) and Huntington et al. (2023) did not report data on the methods used to collect social-validity data. Ferguson et al. (2019) noted that a combination of two or more methods (questionnaire, rating scale, intervention choice, or other) were the most commonly used to collect social- validity data (48% of included studies). Ferguson et al. noted that rating scales were used in 21% of studies, followed by questionnaires (17%), other methods (8%), and intervention choice (6%). We extended the findings of Ferguson et al. by disaggregating this information and providing data on the exact number of studies that included each type of measure. We also reported data on additional methods for collecting social-validity data including direct observations, concurrent-chains interven- tion preferences assessments, and interviews.

Understanding the methods used to collect social- validity data is important for several reasons. First, pro- viding detailed descriptions of data collection methods may enable other researchers to replicate or adapt the methods for similar research questions, which may enhance the future reporting of social-validity data. Sec- ond different methods for collecting social-validity data are likely to have different strengths and limitations. Knowing the specific methods that might be used may help practitioners and researchers evaluate the degree to which a specific method might be useful with a specific respondent and the extent to which the data they collect represent the constructs being measured. Third, the choice of data collection methods may influence the inter- pretation of results. For example, qualitative methods (e.g., interviews or open-ended questions) may provide richer insights into participants’ perspectives, whereas quantitative methods (e.g., Likert-type rating scales) may yield more precise numerical data. Finally, providing a description of specific methods used in social-validity assessments may be valuable for practitioners who wish to implement similar assessments in real-world settings. By providing examples of how different methods have been used in the behavior-analytic research, practitioners may be better able to select and adapt different methods for use in their work.

Respondents who provided social-validity data

Table 3 depicts the number and percentage of studies that gathered social-validity data from different groups of respondents. In the current study, participants who received the intervention were the most common respon- dents for social-validity assessments. Of the total number of studies that reported a measure of social validity, 55.63% gathered data from the participants themselves.

The demographics of the participants varied substan- tially, ranging from young children to adults in a range of contexts including homes, schools, employment set- tings, and disability programs. For example, Fogel et al. (2010) evaluated the effects of exergaming on students’ physical activity in a school physical education class. Sev- eral different exergaming programs were provided to the students. At the end the study, the researchers asked the participants to rank order the exergames from most to least preferred. This allowed the researchers to identify differences in preference among the participating stu- dents. Erath et al. (2020) taught 25 human-services staff working in a residential services program for adults with disabilities to implement behavioral skills training to teach job skills to newly hired program staff. At the con- clusion of the training, participants were invited to respond to questions about their experiences with the training using a modified version of the Intervention Rating Profile-15 (Martens et al., 1985). Finally, studies that employed concurrent-chains intervention preference assessments (e.g., Potter et al., 2013) allowed individuals with disabilities and communication delays to express their preferences for different interventions by providing opportunities for them to choose which intervention con- text they would like to experience.

Participants who were responsible for delivering the intervention provided social-validity data in 25.63% of included studies. Lerman et al. (2013) coached adults with disabilities to deliver teaching programs to young children with autism as part of a vocational training pro- gram. At the conclusion of the role-play portion of the training, the participants were invited to complete a Likert-type rating scale to answer questions including “I like the methods used to train me,” “These training methods were effective in teaching me new skills,” and “I would feel comfortable using these skills with children.” In another study, Allen and Wallace (2013) taught dentists to use a fixed-time schedule of breaks to decrease escape- maintained challenging behavior displayed by children attending a local dental clinic. The dentists were invited to complete a modified version of the Treatment Evaluation Inventory-Short For TEI-SF (Kelley et al., 1989) to pro- vide feedback on how acceptable the treatment was, how willing they would be to use the procedure, and how much they liked the procedure.

Social-validity data were gathered from parents or caregivers not directly involved in delivering the interven- tion in 16.88% of included studies. For example, Rubio et al. (2020) evaluated the effects of a finger prompt on the food acceptance and refusal behavior of children attending a day treatment program for the assessment and treatment of avoidant/restrictive food intake disorder. Parents were invited to observe the intervention and respond to a set of Likert-type questions, such as “I was comfortable with this treatment for my child” and “I feel my child is now accept- ing more food (amount and/or variety) during mealtimes than before this treatment.” Following this feedback,

552 LEIF ET AL.

19383703, 2024, 3, D ow

nloaded from https://onlinelibrary.w

iley.com /doi/10.1002/jaba.1092 by C

apella U niversity, W

iley O nline L

ibrary on [20/09/2024]. See the T erm

s and C onditions (https://onlinelibrary.w

iley.com /term

s-and-conditions) on W iley O

nline L ibrary for rules of use; O

A articles are governed by the applicable C

reative C om

m ons L

icense

parents received training on how to implement the proce- dure with their children at home.

Gibbs et al. (2018) evaluated the effects of noncontin- gent music and RIRD on vocal stereotypy displayed by two children with autism. After completion of the inter- vention, the parents of the participants were invited to view video clips of their child during RIRD alone and RIRD + music. After viewing the recording, the parents responded to Likert-type questions adapted from the TEI-SF (Kelley et al., 1989) measuring treatment accept- ability for each condition. Example questions included “I find this intervention to be an acceptable way of dealing with my child’s vocal stereotypy” and “I believe that my child experiences discomfort during this intervention.” Both parents expressed a preference for using RIRD + music at home.

Educators, therapists, instructors, or other profes- sionals who had a relationship with the participant but who were not directly involved in delivering the interven- tion provided social-validity data in 15.63% of included studies. For example, Luczynski and Hanley (2013) taught communication and social skills to preschool-aged children at risk for the development of challenging behavior. At the end of the study, the authors invited the assistant director of quality assurance for all local pre- schools, the director of the preschool that participants attended, and the lead and assistant classrooms teachers who worked directly with the participants to view video clips of the children during baseline and intervention ses- sions and respond to a series of Likert-type questions about the goals, procedures, and outcomes of the intervention.

Finally, individuals who were not involved in the study or were naïve to the purpose of the study provided social- validity data in 21.88% of included studies. For example, Howard and DiGennaro Reed (2014) coached animal shelter staff to conduct obedience training with hard- to-adopt shelter dogs. At the conclusion of the training, the researchers recruited potential adopters, shelter staff, and animal trainers employed by or volunteering at the shelter but who were not involved in the research to view video clips of the trainer and dog interacting during base- line and training sessions. Respondents were asked to answer questions about the acceptability of the training methods observed using a Likert-type rating scale and to select which video (before or after training) they consid- ered “better” along five dimensions: (a) effectiveness of trainer, (b) desirability of trainer, (c) adoptability of dog, (d) which dog would be better with children, and (e) which dog would be better for a first-time pet owner. Tai and Miltenberger (2017) used behavioral skills training to teach safe tackling skills to youth American-rules football players. At the conclusion of the study, a youth football coach who was naïve to the purpose of the study viewed videos of the tackles made by the participants during base- line and intervention sessions and was asked to select the video depicting the safer tackle.

Collecting these data allowed us to extend the methods used by Carr et al. (1999), Ferguson et al. (2019), and Huntington et al. (2023), who did not report data on respondent types. Our findings were consistent with those reported by Snodgrass et al. (2018), who found that the direct recipients of the intervention most com- monly provided social-validity data. Understanding the source of social-validity assessment data may allow for a more comprehensive assessment of the credibility and reliability of the information. Different individuals or groups may have varying perspectives on and vested interests in the social validity of an intervention. For example, participants who receive an intervention may comment on their preference for the intervention, how much the intervention helped them achieve their unique goals, and how participating in the intervention fits into their daily life. In contrast, opinions provided by parents, teachers, or health care professionals may offer different viewpoints on the acceptability and effectiveness of an intervention based on other factors, such as ease of imple- mentation and cost effectiveness. Knowing who provided the data may help readers determine the extent to which the findings related to social validity can be generalized to broader populations. Finally, in some cases, knowing who provided the social-validity assessment data can reveal potential conflicts of interest. This is especially important in cases where financial or personal interests may influence the assessment of an intervention.

Another potential concern arises when researchers col- lect social-validity data by directly asking participants (e.g., the direct recipients of the intervention or parents of the direct recipient) to rate the quality of services provided. This method introduces a potential bias, as the person pro- viding and evaluating the intervention is the one soliciting feedback, possibly exerting pressure on participants to pro- vide favorable responses. In future research, researchers might mitigate this concern by implementing strategies to minimize bias. For example, if participants are aware that the researcher who delivered the services is gathering feed- back, transparency can be maintained by ensuring that participants understand the purpose of the evaluation and emphasizing the importance of honest feedback. Addition- ally, researchers can employ measures such as anonymous surveys or third-party data collection to reduce the influ- ence of social-desirability bias and encourage participants to provide genuine responses without feeling pressured to be overly positive.

Social-validity measurement points

Table 4 depicts the percentage of total studies that col- lected social-validity data at different points. Most studies (83.13%) assessed social validity at or after the conclusion of the study. At the conclusion of the study, participants who received the intervention or delivered the intervention were often provided with a Likert-type rating scale and

SYSTEMATIC REVIEW OF SOCIAL VALIDITY 553

19383703, 2024, 3, D ow

nloaded from https://onlinelibrary.w

iley.com /doi/10.1002/jaba.1092 by C

apella U niversity, W

iley O nline L

ibrary on [20/09/2024]. See the T erm

s and C onditions (https://onlinelibrary.w

iley.com /term

s-and-conditions) on W iley O

nline L ibrary for rules of use; O

A articles are governed by the applicable C

reative C om

m ons L

icense

asked to rate their agreement with statements about the intervention goals, procedures, and/or outcomes. For example, at the conclusion of the study, Hanley et al. (2014) administered a four-item rating scale to parents whose children participated in an intervention to reduce challenging behavior and increase functional replacement behaviors. To supplement information gathered by the families following the intervention, Hanley et al. reported data on the time and cost associated with the intervention. Although not directly related to participant or family per- ceptions about the goals, procedures, and outcomes of the intervention, providing representative data on time and cost might help influence public perceptions about the social validity of the intervention, particularly if the inter- vention is publicly funded.

A much smaller number of studies conducted social- validity assessments prior to the start of the intervention (9.38%) or during the intervention (13.75%). As discussed above, conducting direct observations of peers prior to intervention might help researchers develop socially valid goals that reflect developmentally or contextually appro- priate behavior. These observations can also inform the development and implementation of the intervention by helping researchers to define the target behaviors of inter- est or providing a performance standard (or terminal goal) from which to evaluate the participant’s progress. Carlile et al. (2018) provided a unique example of a social-validity preassessment conducted with children who were not involved with the study. Prior to implementing an inter- vention to teach six school-aged children with autism to request help when lost, the researchers asked 45 similar- aged typically developing peers to answer open-ended questions about what it meant to be lost, what to do when lost, and their use of cell phones. The data collected from this assessment were used to develop the individualized target behavior definitions for each participant. In another example, Downs et al. (2015) asked a certified yoga instructor to review and provide feedback on a task analy- sis for teaching yoga postures prior to implementing a video self-evaluation intervention for improving yoga pos- tures with two adult yoga students.

Studies that used concurrent-chains intervention prefer- ence assessments most often collected social-validity data during intervention. In other words, participants were pro- vided with the opportunity to select a schedule-correlated stimulus associated with a specific intervention and then experience the intervention following selection. Although these types of social-validity assessments were coded as occurring during the intervention, they often occurred after the researchers introduced and assessed the efficacy of dif- ferent interventions for the participant. Campbell and Anderson (2011) provided a unique example of a social- validity assessment conducted with teachers during the delivery of the intervention, using a Likert-type rating scale and questionnaire. Teachers were coached to deliver a Check-In Check-Out intervention with four students who displayed challenging behavior that resulted in office

disciplinary referrals. Teachers’ perceptions of changes in student challenging behavior (outcomes) were assessed once or twice a week throughout the study using a two-item rating scale. Additionally, the contextual fit of the interven- tion was assessed with the teachers during the initial imple- mentation phase and at the end of the study using the Contextual Fit Questionnaire (Horner et al., 2003). This questionnaire asked teachers to provide feedback on the ease of implementation of the intervention, the amount of effort required to implement the intervention, and whether the effects of the intervention were worth the effort. The researchers made modifications to the intervention on an ongoing basis in response to the information provided by teachers via social-validity assessments.

These findings were similar to those reported by Snodgrass et al. (2018), who found that social-validity assessments were most commonly conducted at or after the conclusion of the intervention. Knowing the point at which social-validity assessment data were collected may be important for several reasons. Social-validity data collected at different points can provide insights into whether and how an intervention has been adapted or modified in response to feedback from participants. This can shed light on the dynamic nature of interven- tion development and implementation. In the current study, most social-validity assessments were found to be conducted at the conclusion of the study. Thus, social-validity data may not be commonly used in research to inform the development of interventions or changes to an intervention during a study (although these data may inform the development of subsequent studies). Additionally, over time a participant’s percep- tions and expectations of an intervention may change. Knowing the point of data collection helps identify poten- tial response shifts, where participants’ initial expectations or judgments may evolve as they experience the interven- tion. Although we only looked at the points at which social-validity data were collected within studies, it may be equally important to look at the points at which social- validity data are collected across studies and years. An intervention considered socially valid during one period may become less acceptable due to changing societal atti- tudes and norms (see Barnes, 2019), individual or collec- tive beliefs (see King et al., 2006), or global health and economic conditions (see Nicolson et al., 2020). Social- validity assessments may be useful in identifying these changes.

GENERAL DISCUSSION

In this review, we replicated and extended the procedures described by Carr et al. (1999), Snodgrass et al. (2018), Ferguson et al. (2019), and Huntington et al. (2023) to systematically identify the prevalence and type of social- validity assessments published in JABA between 2010 and 2020. We found the percentage of studies including

554 LEIF ET AL.

19383703, 2024, 3, D ow

nloaded from https://onlinelibrary.w

iley.com /doi/10.1002/jaba.1092 by C

apella U niversity, W

iley O nline L

ibrary on [20/09/2024]. See the T erm

s and C onditions (https://onlinelibrary.w

iley.com /term

s-and-conditions) on W iley O

nline L ibrary for rules of use; O

A articles are governed by the applicable C

reative C om

m ons L

icense

social-validity assessments was relatively stable between 2010 and 2019, with a marked increase in 2020. We found that social-validity measures designed to assess the acceptability of intervention procedures and outcomes were most common, with relatively fewer studies asses- sing the acceptability of the intervention goals. Likert- type rating scales were the most commonly used method for collecting social-validity data, followed by non- Likert-type questionnaires. In addition to prevalence and type, we reported data on the respondent characteristics and the points at which social-validity data were conducted during the study. We found that in over half of the included studies, the direct recipients of the inter- vention provided information about the social validity of the intervention’s goals, procedures, or outcomes. Social-validity data were less commonly provided by peo- ple who delivered the intervention (e.g., parents, teachers, coaches), people who had a relationship with the direct recipient of the intervention but did not deliver the inter- vention (e.g., parents, teachers, therapists), or people who did not have a relationship with the direct recipient of the intervention and were not involved in the study (e.g., undergraduate students, Board Certified Behavior Analysts, health professionals, coaches, employers). Most social-validity assessments were conducted at the conclu- sion of the intervention.

Recommendations to increase the collection and reporting of social-validity data in behavior-analytic research have been made consistently (Baer et al., 1987; Detrich, 2018; Hanley, 2010; Schwartz & Baer, 1991; Wolf, 1978), yet the current findings demonstrate that social-validity assessments are still relatively infrequently employed as primary or secondary measures for research published in JABA. There are several potential reasons why this might be the case. First, Carr et al. (1999) noted that behavior-analytic journals do not provide recommen- dations about when and how to report social-validity data or require such measures for publication, together poten- tially contributing to the underreporting of such data. Sec- ond, editors and reviewers of behavior-analytic journals may prioritize the collection and reporting of data on the effectiveness of interventions rather than more subjective measures about the perceived acceptability and value of these interventions. Indeed, JABA’s author guidelines (Journal of Applied Behavior Analysis, n.d.) state that the primary focus of JABA is on research studies demonstrat- ing socially important functional relations. Although the author guidelines currently state that the clinical signifi- cance of the effects for individuals should be discussed, it is noted that direct measures of behavior are critical for the acceptance of research in the journal. Concurrent- chains intervention preference assessments provide one direct measure of the potential social acceptability of the intervention procedures. However, most social-validity assessments rely on subjective measures such as personal opinions. Therefore, authors may wonder if personal opinions about the goals, procedures, or outcomes of an

intervention are appropriate for publication in JABA. We encourage the editorial board of JABA to consider providing more clear advice on when and how to report measures of social validity in research studies.

Third, Huntington et al. (2023) noted that the variety of terms used in the literature to describe social validity (e.g., satisfaction, preference, acceptability) may make it challenging to identify, compare, and contrast social- validity assessments and outcomes. Huntington et al. argued that imprecise use of terms to describe social validity may be inconsistent with a behavior-analytic commitment to technical descriptions of intervention procedures and research methods. As noted above, the JABA author guidelines recommend that authors describe the clinical significance of behavior change. However, the term “clinical significance” is not defined and its relation to social validity is unclear. To address this challenge, we have attempted to provide more precise definitions for the dimensions of social validity (Table 1), the methods of collecting social-validity data (Table 2), groups of respondents who might provide social-validity data (Table 3), and points at which social-validity data might be collected (Table 4). We encourage researchers to clearly describe the methods and procedures used to collect social-validity data in future studies and hope the definitions provided in the current study will help increase consistency in the use of terms or concepts.

Recommendations

Based on the findings of the currently study, we believe there are meaningful steps that researchers can take to improve the reporting of social validity in JABA. In what follows, we provide three practical recommendations for potentially improving the collection and reporting infor- mation about the social validity of interventions.

Recommendation #1: Integrate social validity and informed consent procedures

Behavior analysts adhering to the Behavior Analyst Certifi- cation Board Ethics Code (2020) have an ethical responsibil- ity to obtain informed consent from clients and participants before engaging in behavioral assessments, interventions, or changes in intervention design. Additionally, behavior ana- lysts have an ethical responsibility to respect and actively promote client choice and self-determination to the best of their abilities, particularly when providing services to vulnerable populations. To obtain informed consent, it is important for researchers and practitioners to clearly explain the goals, procedures, and anticipated outcomes associated with the delivery of the intervention. Thus, the process of obtaining informed consent may provide opportunities to gather data on the social validity of the intervention prior to its implementation. However, in

SYSTEMATIC REVIEW OF SOCIAL VALIDITY 555

19383703, 2024, 3, D ow

nloaded from https://onlinelibrary.w

iley.com /doi/10.1002/jaba.1092 by C

apella U niversity, W

iley O nline L

ibrary on [20/09/2024]. See the T erm

s and C onditions (https://onlinelibrary.w

iley.com /term

s-and-conditions) on W iley O

nline L ibrary for rules of use; O

A articles are governed by the applicable C

reative C om

m ons L

icense

the current study, few studies reported formal measures of social validity prior to the start of the intervention (although such measures might be collected but omitted from published research). We encourage researchers to inte- grate social-validity measures into the informed consent process. We recommend that researchers describe how social validity was assessed before the intervention (e.g., via interviews or questionnaires) and what changes were made to aspects of the intervention (e.g., interven- tion procedures used) based on social-validity data. Researchers might ask participants to consent to data being collected and reported on how the intervention goals and procedures were developed and changed based on par- ticipant feedback prior to the actual implementation of the intervention or report the number of participants who declined to participate following a description of the inter- vention procedures.

Recommendation #2: Incorporate ongoing assessments of social validity

Behavior analysis is unique from other fields of psychologi- cal study in that there is an emphasis on understanding idio- syncratic functional relations between an organism’s behavior(s) and the environments within which it occurs (Skinner, 1953). This has led to the prioritization of direct behavioral assessments as well as the use of single-case research methods that allow for rigorous and reliable explo- ration of behavioral variability as the datum of interest (Sidman, 1960). These methods allow for the elaboration of broader behavioral principles while actively informing ongo- ing intervention and treatment decisions (Kazdin, 2021). However, this same approach has not been applied to social-validity assessment. Most social-validity assessments were conducted at the conclusion of the intervention, and, therefore, the data may not be used to inform changes to goals of the intervention or the intervention design during the study. We recommend that researchers and practitioners adopt an ongoing approach to social-validity assessment during the intervention. Assessments of the social validity of the goals of the intervention, the procedures used to deliver the intervention, and the outcomes of the intervention should ideally be conducted at various stages throughout the intervention process. Understanding stakeholders’ per- spectives at various stages can help ensure that the interven- tion aligns with their values, priorities, and needs. For example, researchers and practitioners might assess stake- holders’ perceptions of the intervention procedures such as the clarity of instructions, the feasibility of implementation, the acceptability of the delivery format, and the appropriate- ness of the intervention activities. In some cases, it may be useful to collect social-validity data from multiple stake- holders (e.g., parents, teachers, therapists) at different points to identify and address disagreements related to the goals of intervention, acceptability of the intervention procedures, or importance of the outcomes. Regular and systematic

assessment of social validity throughout the intervention timeline may help promote stakeholder engagement, improve intervention design and delivery, and enhance the overall efficacy of the intervention.

We also recommend that researchers and practitioners explore ways in which personalized and idiosyncratic mea- sures of social validity can be incorporated alongside exist- ing Likert-type questionnaires. Some meaningful examples have been presented in the published research where behav- ior analysts have assessed, defined, and then measured per- sonalized and idiosyncratic behaviors that may be indicative of the social validity of an intervention. For example, Green and Reid (1996), Parsons et al. (2012), and Ramey et al. (2023) demonstrated that personalized indi- ces of happiness and unhappiness could be operationally defined and reliably measured. In addition, the concept of “happy, relaxed, and engaged” may provide a useful heuristic to support the personalization of measures of social validity (see Gover et al., 2022). Finally, develop- ing and incorporating novel applications of concurrent- chains intervention preference assessments, such as the enhanced choice model (Rajaraman et al., 2022), may be useful for refining the measurement and reporting of the social validity of interventions for the direct recipient of the intervention. Rajaraman et al. (2022) demonstrated that ongoing assessment of social validity could be implemented with children by providing them with concurrent, continuously available options to (a) experience skill-based treatment for their challenging behavior (intervention context), (b) experience noncon- tingent reinforcement (hangout context), or (c) leave the intervention setting altogether. By regularly assessing the child’s choice, the researchers could alter the skill- based treatment context (including the schedule and type of demands and reinforcers presented) to ensure that it included components that the child preferred and to enhance the child’s willingness to participate in the intervention.

Recommendation #3: Include open-ended response options

In the current study, the most common method of collect- ing social-validity data was via Likert-type rating scales, a closed-ended assessment method. Including open-ended response options in social-validity assessments may be a valuable way to gather qualitative data and in-depth feed- back from participants. These open-ended responses can provide insights, context, and nuanced perspectives that closed-ended questions may not capture (Fryling & Baires, 2016). Asking open-ended questions to participants with repertoires of vocalized verbal behavior and record- ing their responses (see Nieto & Wiskow, 2020) may be one way to reduce the response effort for participants to engage in social-validity assessments. Researchers might ask open-ended questions to learn about participant’s

556 LEIF ET AL.

19383703, 2024, 3, D ow

nloaded from https://onlinelibrary.w

iley.com /doi/10.1002/jaba.1092 by C

apella U niversity, W

iley O nline L

ibrary on [20/09/2024]. See the T erm

s and C onditions (https://onlinelibrary.w

iley.com /term

s-and-conditions) on W iley O

nline L ibrary for rules of use; O

A articles are governed by the applicable C

reative C om

m ons L

icense

perceptions of the benefits of the intervention and any adverse or unexpected effects associated with the interven- tion. To report data, researchers might consider including illustrative quotes from participants in text or in a table.

Limitations and future research

Some limitations of the current study warrant mention. First, we only systematically identified and appraised social-validity assessments published in JABA. Thus, our findings may not be representative of all relevant studies including social-validity assessments in the behavior- analytic literature and future research could evaluate similar procedures for additional journals. Second, we did not report data on the settings in which interventions were conducted. The dimensions of social validity assessed and the methods used to gather these data may differ among university-based clinics, schools, and com- munity-based settings. In the future, researchers could explore differences in social-validity measurement and reporting across settings, including differences in respon- dents across settings. Third, we included both open-ended questions and non-Likert-type closed-ended questions in our definition of non-Likert-type questionnaires (see Table 2), a limitation of our data-coding procedures. Open-ended questions often yield qualitative data that require different analytical approaches than those used for closed-ended questions. Future researchers could focus on developing more precise coding methods. Knowing the characteristics of individuals who find an intervention acceptable and effective might help researchers tailor interventions to better meet the needs of specific populations. Finally, the studies included in this review used a wide range of methods to collect social-validity data. Future researchers may wish to con- duct a more in-depth review of individual methods used to collect social-validity data and the outcomes reported. The findings of such reviews might help practitioners and researchers identify when and how to conduct various types of social-validity assessments and may establish a more robust evidence base for the social validity of behavior-analytic interventions.

ACKNOWLEDGMENT Open access publishing facilitated by Monash University, as part of the Wiley – Monash University agreement via the Council of Australian University Librarians.

CONFLICT OF INTEREST STATEMENT The authors do not have any conflicts of interest to declare.

DATA AVAILABILITY STATEMENT The data collected as part of this study can be found in the Additional Supporting Information in the online ver- sion of this article at the publisher’s website.

ETHICS APPROVAL No human or animal subjects were used to produce this article.

ORCID Erin S. Leif https://orcid.org/0000-0003-2219-2405 Bradley S. Bloomfield https://orcid.org/0000-0002- 5792-5480 Russell A. Fox https://orcid.org/0000-0002-3061-3495

REFERENCES An asterisk denotes studies that were included in the current review. A full list of included studies can be found in the Supporting Information in the online version of this article at the publisher’s website. *Abellon, O. E., & Wilder, D. A. (2014). The effect of equipment prox-

imity on safe performance in a manufacturing setting. Journal of Applied Behavior Analysis, 47(3), 628–632. https://doi.org/10.1002/ jaba.137

*Allen, K. D., & Wallace, D. P. (2013). Effectiveness of using noncon- tingent escape for general behavior management in a pediatric dental clinic. Journal of Applied Behavior Analysis, 46(4), 723–737. https://doi.org/10.1002/jaba.82

*Allison, J., Wilder, D. A., Chong, I. V. Y., Lugo, A., Pike, J., & Rudy, N. (2012). A comparison of differential reinforcement and noncontingent reinforcement to treat food selectivity in a child with autism. Journal of Applied Behavior Analysis, 45(3), 613–617. https://doi.org/10.1901/jaba.2012.45-613

*Austin, J. L., & Bevan, D. (2011). Using differential reinforcement of low rates to reduce children’s requests for teacher attention. Journal of Applied Behavior Analysis, 44(3), 451–461. https://doi. org/10.1901/jaba.2011.44-451

Baer, D. M., Wolf, M. M., & Risley, T. R. (1987). Some still-current dimensions of applied behavior analysis. Journal of Applied Behav- ior Analysis, 20(4), 313–327. https://doi.org/10.1901/jaba.1987. 20-313

Barnes, C. (2019). Understanding the social model of disability: Past, present and future. In N. Watson, A. Roulstone, & C. Thomas (Eds.), Routledge handbook of disability studies (pp. 14–31). Routledge.

Behavior Analyst Certification Board. (2020). Ethics code for behavior analysts. https://bacb.com/wp-content/ethics-code-for-behavior- analysts/

Braun, V., & Clarke, V. (2022). Conceptual and design thinking for the- matic analysis. Qualitative Psychology, 9(1), 3–26. https://psycnet. apa.org/doi/10.1037/qup0000196

Callahan, K., Hughes, H. L., Mehta, S., Toussaint, K. A., Nichols, S. M., Ma, P. S., Kutlu, M., & Wang, H. T. (2017). Social validity of evidence-based practices and emerging interven- tions in autism. Focus on Autism and Other Developmental Disabil- ities, 32(3), 188–197. https://doi.org/10.1177/1088357616632446

*Campbell, A., & Anderson, C. M. (2011). Check-in/check-out: A sys- tematic evaluation and component analysis. Journal of Applied Behavior Analysis, 44(2), 315–326. https://doi.org/10.1901/jaba. 2011.44-315

*Carlile, K. A., DeBar, R. M., Reeve, S. A., Reeve, K. F., & Meyer, L. S. (2018). Teaching help-seeking when lost to individ- uals with autism spectrum disorder. Journal of Applied Behavior Analysis, 51(2), 191–206. https://doi.org/10.1002/jaba.447

Carr, J. E., Austin, J. L., Britton, L. N., Kellum, K. K., & Bailey, J. S. (1999). An assessment of social validity trends in applied behavior analysis. Behavioral Interventions, 12(4), 223–231. https://doi.org/ 10.1002/(SICI)1099-078X(199910/12)14:4

Carter, S. L., & Wheeler, J. J. (2019). The social validity manual. Elsevier Science & Technology.

Common, E. A., & Lane, K. L. (2017). Social validity assessment. In J. K. Luiselli. (Ed.), Applied behavior analysis advanced guidebook:

SYSTEMATIC REVIEW OF SOCIAL VALIDITY 557

19383703, 2024, 3, D ow

nloaded from https://onlinelibrary.w

iley.com /doi/10.1002/jaba.1092 by C

apella U niversity, W

iley O nline L

ibrary on [20/09/2024]. See the T erm

s and C onditions (https://onlinelibrary.w

iley.com /term

s-and-conditions) on W iley O

nline L ibrary for rules of use; O

A articles are governed by the applicable C

reative C om

m ons L

icense

https://orcid.org/0000-0003-2219-2405

https://orcid.org/0000-0002-5792-5480

https://orcid.org/0000-0002-3061-3495

https://doi.org/10.1002/jaba.137

https://doi.org/10.1002/jaba.82

https://doi.org/10.1901/jaba.2012.45-613

https://doi.org/10.1901/jaba.2011.44-451

https://doi.org/10.1901/jaba.1987.20-313

https://bacb.com/wp-content/ethics-code-for-behavior-analysts/

https://doi.org/10.1037/qup0000196

https://doi.org/10.1177/1088357616632446

https://doi.org/10.1901/jaba.2011.44-315

https://doi.org/10.1002/jaba.447

https://doi.org/10.1002/(SICI)1099-078X(199910/12)14:4

A manual for professional practice (pp. 73–92). Academic Press. https://doi.org/10.1016/B978-0-12-811122-2.00004-8

Cook, B. G., Cook, L., & Landrum, T. J. (2013). Moving research into practice: Can we make dissemination stick? Exceptional Children, 79(3), 163–180. https://doi.org/10.1177/001440291307900203

Detrich, R. (2018). Rethinking dissemination: Storytelling as a part of the repertoire. Perspectives on Behavior Science, 41(2), 541–549. https://doi.org/10.1007/s40614-018-0160-y

*DiGennaro-Reed, F. D., Codding, R., Catania, C. N., & Maguire, H. (2010). Effects of video modeling on intervention integrity of behavioral interventions. Journal of Applied Behavior Analysis, 43(2), 291–295. https://doi.org/10.1901/jaba.2010.43-291

*Downs, H. E., Miltenberger, R., Biedronski, J., & Witherspoon, L. (2015). The effects of video self-evaluation on skill acquisition with yoga postures. Journal of Applied Behavior Analysis, 48(4), 930– 935. https://doi.org/10.1002/jaba.248

*Erath, T. G., DiGennaro Reed, F. D., Sundermeyer, H. W., Brand, D., Novak, M. D., Harbison, M. J., & Shears, R. (2020). Enhancing the training integrity of human service staff using pyra- midal behavioral skills training. Journal of Applied Behavior Anal- ysis, 53(1), 449–464. https://doi.org/10.1002/jaba.608

Ferguson, J. L., Cihon, J. H., Leaf, J. B., Van Meter, S. M., McEachin, J., & Leaf, R. (2019). Assessment of social validity trends in the Journal of Applied Behavior Analysis. European Jour- nal of Behavior Analysis, 20(1), 146–157. https://doi.org/10.1080/ 15021149.2018.1534771

*Fogel, V. A., Miltenberger, R. G., Graves, R., & Koehler, S. (2010). The effects of exergaming on physical activity among inactive children in a physical education classroom. Journal of Applied Behavior Analysis, 43(4), 591–600. https://doi.org/10.1901/jaba.2010.43-591

Fryling, M. J., & Baires, N. A. (2016). The practical importance of the distinction between open and closed-ended indirect assessments. Behavior Analysis in Practice, 9(2), 146–151. https://doi.org/10. 1007/s40617-016-0115-2

*Gibbs, A. R., Tullis, C. A., Thomas, R., & Elkins, B. (2018). The effects of noncontingent music and response interruption and redi- rection on vocal stereotypy. Journal of Applied Behavior Analysis, 51(4), 899–914. https://doi.org/10.1002/jaba.485

Gover, H. C., Staubitz, J. E., & Ju�arez, A. P. (2022). Revisiting rein- forcement: A focus on happy, relaxed, and engaged students. TEACHING Exceptional Children, 55(1), 72–74. https://doi.org/ 10.1177/00400599221123185

Green, C. W., & Reid, D. H. (1996). Defining, validating, and increas- ing indices of happiness among people with profound multiple disabilities. Journal of Applied Behavior Analysis, 29(1), 67–78. https://doi.org/10.1901/jaba.1996.29-67

*Grosberg, D., & Charlop, M. H. (2017). Teaching conversational speech to children with autism spectrum disorder using text-message prompting. Journal of Applied Behavior Analysis, 50(4), 789–804. https://doi.org/10.1002/jaba.403

*Gunning, C., Holloway, J., & Grealish, L. (2020). An evaluation of parents as behavior change agents in the Preschool Life Skills pro- gram. Journal of Applied Behavior Analysis, 53(2), 889–917. https://doi.org/10.1002/jaba.660

Hanley, G. P. (2010). Toward effective and preferred programming: A case for the objective measurement of social validity with recipi- ents of behavior-change programs. Behavior Analysis in Practice, 3(1), 13–21. https://doi.org/10.1007/BF03391754

*Hanley, G. P., Jin, C. S., Vanselow, N. R., & Hanratty, L. A. (2014). Producing meaningful improvements in problem behavior of chil- dren with autism via synthesized analyses and treatments. Journal of Applied Behavior Analysis, 47(1), 16–36. https://doi.org/10.1002/ jaba.106

Horner, R., Salantine, S., & Albin, R. (2003). Self assessment of contex- tual fit in schools. Educational and Community Supports.

* Howard, V. J., & DiGennaro Reed, F. D. (2014). Training shelter vol- unteers to teach dog compliance. Journal of Applied Behavior Analysis, 47(2), 344–359. https://doi.org/10.1002/jaba.120

Huntington, R. N., Badgett, N. M., Rosenberg, N. E., Greeny, K., Bravo, A., Bristol, R. M., Byun, Y. H., & Park, M. S. (2023). Social validity in behavioral research: A selective review. Perspec- tives on Behavior Science, 46(1), 201–215. https://doi.org/10.1007/ s40614-022-00364-9

*Jones, M. E., Allan Allday, R., & Givens, A. (2019). Reducing adoles- cent cell phone usage using an interdependent group contingency. Journal of Applied Behavior Analysis, 52(2), 386–393. https://doi. org/10.1002/jaba.538

Journal of Applied Behavior Analysis. (n.d.). Author guidelines. https://onlinelibrary.wiley.com/page/journal/19383703/ homepage/forauthors.html

Kazdin, A. E. (1977). Assessing the clinical or applied importance of behavior change through social validation. Behavior Modification, 1(4), 427–452. https://doi.org/10.1177/014544557714001

Kazdin, A. E. (2021). Single-case experimental designs: Characteristics, changes, and challenges. Journal of the Experimental Analysis of Behavior, 115(1), 56–85. https://doi.org/10.1002/jeab.638

Kelley, M. L., Heffer, R. W., Gresham, F. M., & Elliott, S. N. (1989). Development of a modified intervention evaluation inventory. Journal of Psychopathology and Behavioral Assessment, 11(3), 235–247. https://doi.org/10.1007/BF00960495

Kern, L., & Manz, P. (2004). A look at current validity issues of school- wide behavior support. Behavioral Disorders, 30(1), 47–59. https:// doi.org/10.1177/019874290403000102

King, G. A., Zwaigenbaum, L., King, S., Baxter, D., Rosenbaum, P., & Bates, A. (2006). A qualitative investigation of changes in the belief systems of families of children with autism or Down syn- drome. Child: Care, Health and Development, 32(3), 353–369. https://doi.org/10.1111/j.1365-2214.2006.00571.x

*Leaf, J. B., Sheldon, J. B., & Sherman, J. A. (2010). Comparison of simultaneous prompting and no-no prompting in two-choice dis- crimination learning with children with autism. Journal of Applied Behavior Analysis, 43(2), 215–228. https://doi.org/10.1901/jaba. 2010.43-215

Leko, M. M. (2014). The value of qualitative methods in social validity research. Remedial and Special Education, 35(5), 275–286. https:// doi.org/10.1177/0741932514524002

*Lerman, D. C., Hawkins, L., Hoffman, R., & Caccavale, M. (2013). Training adults with an autism spectrum disorder to conduct discrete-trial training for young children with autism: A pilot study. Journal of Applied Behavior Analysis, 46(2), 465–478. https://doi.org/10.1002/jaba.50

Lloyd, J. W., & Heubusch, J. D. (1996). Issues of social validation in research on serving individuals with emotional or behavioral disor- ders. Behavioral Disorders, 22(1), 8–14. https://doi.org/10.1177/ 019874299602200105

*Luczynski, K. C., & Hanley, G. P. (2013). Prevention of problem behavior by teaching functional communication and self-control skills to preschoolers. Journal of Applied Behavior Analysis, 46(2), 355–368. https://doi.org/10.1002/jaba.44

*Mancuso, C., & Miltenberger, R. G. (2016). Using habit reversal to decrease filled pauses in public speaking. Journal of Applied Behav- ior Analysis, 49(1), 188–192. https://doi.org/10.1002/jaba.267

*Mann, C. C., & Karsten, A. M. (2020). Efficacy and social validity of procedures for improving conversational skills of college students with autism. Journal of Applied Behavior Analysis, 53(1), 402–421. https://doi.org/10.1002/jaba.600

Martens, B. K., Witt, J. C., Elliott, S. N., & Darveaux, D. X. (1985). Teacher judgments concerning the acceptability of school-based interventions. Professional Psychology: Research and Practice, 16(2), 191–198. https://doi.org/10.1037/0735-7028.16.2.191

Nicolson, A. C., Lazo-Pearson, J. F., & Shandy, J. (2020). ABA finding its heart during a pandemic: An exploration in social validity. Behavior Analysis in Practice, 13(4), 757–766. https://doi.org/10. 1007/s40617-020-00517-9

*Nieto, P., & Wiskow, K. M. (2020). Evaluating adult interaction dur- ing the Step It UP! game to increase physical activity in children.

558 LEIF ET AL.

19383703, 2024, 3, D ow

nloaded from https://onlinelibrary.w

iley.com /doi/10.1002/jaba.1092 by C

apella U niversity, W

iley O nline L

ibrary on [20/09/2024]. See the T erm

s and C onditions (https://onlinelibrary.w

iley.com /term

s-and-conditions) on W iley O

nline L ibrary for rules of use; O

A articles are governed by the applicable C

reative C om

m ons L

icense

https://doi.org/10.1016/B978-0-12-811122-2.00004-8

https://doi.org/10.1177/001440291307900203

https://doi.org/10.1007/s40614-018-0160-y

https://doi.org/10.1901/jaba.2010.43-291

https://doi.org/10.1002/jaba.248

https://doi.org/10.1002/jaba.608

https://doi.org/10.1080/15021149.2018.1534771

https://doi.org/10.1901/jaba.2010.43-591

https://doi.org/10.1007/s40617-016-0115-2

https://doi.org/10.1002/jaba.485

https://doi.org/10.1177/00400599221123185

https://doi.org/10.1901/jaba.1996.29-67

https://doi.org/10.1002/jaba.403

https://doi.org/10.1002/jaba.660

https://doi.org/10.1007/BF03391754

https://doi.org/10.1002/jaba.106

https://doi.org/10.1002/jaba.120

https://doi.org/10.1007/s40614-022-00364-9

https://doi.org/10.1002/jaba.538

https://onlinelibrary.wiley.com/page/journal/19383703/homepage/forauthors.html

https://doi.org/10.1177/014544557714001

https://doi.org/10.1002/jeab.638

https://doi.org/10.1007/BF00960495

https://doi.org/10.1177/019874290403000102

https://doi.org/10.1111/j.1365-2214.2006.00571.x

https://doi.org/10.1901/jaba.2010.43-215

https://doi.org/10.1177/0741932514524002

https://doi.org/10.1002/jaba.50

https://doi.org/10.1177/019874299602200105

https://doi.org/10.1002/jaba.44

https://doi.org/10.1002/jaba.267

https://doi.org/10.1002/jaba.600

https://doi.org/10.1037/0735-7028.16.2.191

https://doi.org/10.1007/s40617-020-00517-9

Journal of Applied Behavior Analysis, 53(3), 1354–1366. https:// doi.org/10.1002/jaba.699

Parsons, M. B., Reid, D. H., Bentley, E., Inman, A., & Lattimore, L. P. (2012). Identifying indices of happiness and unhappiness among adults with autism: Potential targets for behavioral assessment and intervention. Behavior Analysis in Practice, 5(1), 15–25. https://doi. org/10.1007/BF03391814

*Potter, J. N., Hanley, G. P., Augustine, M., Clay, C. J., & Phelps, M. C. (2013). Treating stereotypy in adolescents diagnosed with autism by refining the tactic of “using stereotypy as reinforce- ment.” Journal of Applied Behavior Analysis, 46(2), 407–423. https://doi.org/10.1002/jaba.52

*Raiff, B. R., & Dallery, J. (2010). Internet-based contingency manage- ment to improve adherence with blood glucose testing recommen- dations for teens with Type 1 diabetes. Journal of Applied Behavior Analysis, 43(3), 487–491.

Rajaraman, A., Hanley, G. P., Gover, H. C., Staubitz, J. L., Staubitz, J. E., Simcoe, K. M., & Metras, R. (2022). Minimizing escalation by treating dangerous problem behavior within an enhanced choice model. Behavior Analysis in Practice, 15(1), 219–242. https:// doi.org/10.1007/s40617-020-00548-2

Ramey, D., Healy, O., & McEnaney, E. (2023). Defining and measuring indices of happiness and unhappiness in children diag- nosed with autism spectrum disorder. Behavior Analysis in Prac- tice, 16(1), 194–209. https://doi.org/10.1007/s40617-022-00710-y

Reimers, T., Wacker, D., & Koeppl, G. (1987). Acceptability of behav- ioral interventions: A review of the literature. School Psychology Review, 16(2), 212–227. https://doi.org/10.1080/02796015.1987. 12085286

*Rubio, E. K., Volkert, V. M., Farling, H., & Sharp, W. G. (2020). Evaluation of a finger prompt variation in the treatment of pediat- ric feeding disorders. Journal of Applied Behavior Analysis, 53(2), 956–972. https://doi.org/10.1002/jaba.658

Schwartz, I. S., & Baer, D. M. (1991). Social validity assessments: Is current practice state of the art? Journal of Applied Behavior Analysis, 24(2), 189–204. https://doi.org/10.1901/jaba.1991. 24-189

Sidman, M. (1960). Tactics of scientific research: Evaluating experimen- tal data in psychology. Basic Books.

Skinner, B. F. (1953). Science and human behavior. Macmillan. Snodgrass, M. R., Chung, M. Y., Meadan, H., & Halle, J. W. (2018).

Social validity in single-case research: A systematic literature review of prevalence and application. Research in Developmental Disabilities, 74, 160–173. https://doi.org/10.1016/j.ridd.2018.01.007

*Stokes, J. V., Luiselli, J. K., & Reed, D. D. (2010). A behavioral inter- vention for teaching tackling skills to high school football athletes. Journal of Applied Behavior Analysis, 43(3), 509–512. https://doi. org/10.1901/jaba.2010.43-509

*Tai, S. S., & Miltenberger, R. G. (2017). Evaluating behavioral skills training to teach safe tackling skills to youth football players. Jour- nal of Applied Behavior Analysis, 50(4), 849–855. https://doi.org/ 10.1002/jaba.412

Wolf, M. M. (1978). Social validity: The case for subjective measure- ment or how applied behavior analysis is finding its heart. Journal of Applied Behavior Analysis, 11(2), 203–214. https://doi.org/10. 1901/jaba.1978.11-203

SUPPORTING INFORMATION Additional supporting information can be found online in the Supporting Information section at the end of this article.

How to cite this article: Leif, E. S., Kelenc-Gasior, N., Bloomfield, B. S., Furlonger, B., & Fox, R. A. (2024). A systematic review of social-validity assessments in the Journal of Applied Behavior Analysis: 2010–2020. Journal of Applied Behavior Analysis, 57(3), 542–559. https://doi.org/10.1002/ jaba.1092

SYSTEMATIC REVIEW OF SOCIAL VALIDITY 559

19383703, 2024, 3, D ow

nloaded from https://onlinelibrary.w

iley.com /doi/10.1002/jaba.1092 by C

apella U niversity, W

iley O nline L

ibrary on [20/09/2024]. See the T erm

s and C onditions (https://onlinelibrary.w

iley.com /term

s-and-conditions) on W iley O

nline L ibrary for rules of use; O

A articles are governed by the applicable C