A critical appraisal of the literature exploring the surgical treatment of degenerative lumbosacral stenosis in dogs

Objective: To critically appraise the literature exploring the surgical treatment of degenerative lumbosacral stenosis in dogs. 
Background: Several surgical procedures to treat degenerative lumbosacral stenosis (DLSS) in the dog have been reported, however, definitive criteria for surgical technique preference are currently lacking (1). 
Evidentiary value and methods: A critical appraisal tool that examined the conduct and reporting of studies on the results of surgical treatment of DLSS was designed and, after a systematic search and screening of the literature, a critical appraisal was performed in 20 papers. 
Results: Results showed that 18/20 studies included did not clearly report inclusion and exclusion criteria and in 14/20 it was unclear whether consecutive inclusion of participants was applied. 19/20 studies reported age, breed, and sex of the participants, and 13/20 did not report duration and prevalence of clinical signs. In 13/20 studies, it was considered that the condition was not measured in a standard and reliable way. Objective outcome measures were used in 7/20 studies. 
Conclusion and application: The results demonstrate that there is room for improvement in the conduct and reporting quality of case series so that rigorous data can be generated and analysed, to inform research design, guide clinical practice, and improve veterinary healthcare delivery. 
  



INTRODUCTION
DLSS is the compression of the cauda equina caused by protrusion of supportive tissues into the vertebral canal (2). Young adult, male, and large-breed dogs seem to be predisposed to this condition, although it can also affect cats (1,(3)(4)(5)(6)(7). Surgical treatment is often performed, particularly when medical treatment fails to provide adequate relief of clinical signs or when neurological deficits are too severe for medical management alone (1,4,(8)(9)(10). Although several surgical procedures have been reported, definitive criteria for surgical technique preference are currently lacking (1). Literature reporting the results of the surgical treatment of DLSS in dogs is mainly composed of case reports and case series. While new and more advanced research methods arise, paving the way for evidence-based medicine, evidence provided from case series and case reports has lost its scientific significance, and its reliability, and therefore its clinical acceptance has been questioned (11,12). The concepts of "case series" and "case reports" are, however, not well defined in the literature. Therefore, the definitions proposed by Abu-Zidan, Abbas & Hefny (13), in which it is suggested that case series should have more than four participants, will be used throughout this paper. Critical appraisal is the process of systematically examining research evidence to judge its trustworthiness, its value and relevance in a particular context (14). Poorly designed and conducted studies can compromise the integrity of the research process and mislead healthcare decision-making at all levels (15,16). Some areas of veterinary medicine have a large body of high ranking evidence, while many others only have poor and low quality forms of evidence (17,18). Therefore, a critical appraisal of available research is essential to make informed decisions in clinical practice. Concern has been raised about the need for better designed and reported clinical trials to assess the outcomes of surgical treatment of DLSS (19)(20)(21), which further emphasises the importance of a critical appraisal of published studies on the topic. The main objective of this study was to critically appraise the literature reporting the results of surgical treatment of DLSS; to identify gaps in current METHODS & MATERIALS 1

. Literature search
The literature search was performed in July 2016 and included the use of three electronic databases: PubMed, Web of Science (1900-present), and Google Scholar (1900-present). Electronic search in PubMed and Web of Science was performed in English and in Google Scholar in Portuguese. Each search had similar components searched as keywords and medical subject headings joined using Boolean operators (Annex 1). The electronic search was complemented with a hand-search in references of review articles and book chapters by: Bagley (22), Sjöström (23), Sharp & Wheeler (1), Meij & Bergknut (20) and Lanz & Rossmeisl (7). This was updated with articles published between 2016 and 2019.

Inclusion and exclusion criteria
Case series or higher ranking evidence was included. For inclusion, papers had to be in vivo studies which reported surgical outcomes of DLSS, published in peer reviewed journals, with the full-text available in English or Portuguese. Articles also had to have at least five participants and concern domestic dogs. Articles which reported only lumbosacral traumatic lesions, discospondylitis, osteochondrosis, or neoplasia, were excluded.

Screening process
All electronic references were imported into Endnote and all duplicates were removed. Articles that did not meet the inclusion criteria or met the exclusion criteria were assessed first by their title and secondly by their abstract, and were removed. The full text was then obtained, if possible. Articles were accessed through the internet if access was available from the Faculty of Veterinary Medicine of the University of Lisbon. If an article was unavailable online or at the Faculty of Veterinary Medicine Library, an attempt to retrieve it by electronic contact with the publishing journal and authors was made.

Critical appraisal
A critical appraisal tool for case series combining the JBI critical appraisal checklists for case reports and case series (24,25), the case report guidelines (26) and the three-minute critical appraisal checklist (27) was designed. The critical appraisal tool examines the conduct and reporting of the participant selection process, participant demographics and clinical information, condition measurement, and outcome assessment (Annex 2). Clear criteria were considered when inclusion and exclusion criteria were described in detail in the case series. An unclear reporting of inclusion or exclusion criteria was considered when one or both of them were not reported. Criteria for inclusion or exclusion of participants for each study were also recorded. Consecutive inclusion of participants was considered when the authors clearly reported that all the patients presented in a certain time period were included in the study. If it was not mentioned that all patients presented in a specific time period were included, this was considered as "unclear". Clear reporting of participant demographics was considered when age, breed and sex were reported for all included animals. Clear reporting of clinical information of the participants was considered if prevalence and duration of clinical signs were stated. Measurement of the condition was considered reliable if all animals were diagnosed with the same measure which assured its repeatability. If there was evidence of a lack of standardisation in condition measurement between participants, the answer was registered as "no". In lack of a validated diagnostic methodology for the diagnosis of DLSS, all diagnostic methods that can provide a diagnosis for DLSS were considered valid. When radiography was performed alone, and no confirmation of diagnosis was performed at surgery, the methodology was considered invalid. Ancillary diagnostic investigations and pain elicitation manoeuvers used for the diagnosis of DLSS were recorded. A clear reporting of outcomes or follow-up results was considered when presence or absence of postsurgical clinical signs was fully described, as well as how and when follow-up was performed and measured. The reporting and measurement methodology for outcome assessment was registered, as well as study follow-up rates. A summary of key criteria to answer the critical appraisal tool questions is presented in Table 1. The results of the critical appraisal were then collated into a summary table and analysed.

Search and screening process
143 papers were initially identified. Following screening, 123 studies were excluded on title and abstract basis and 20 papers fulfilled the inclusion criteria.

Critical appraisal
The results of the close-ended questions of the critical appraisals are summarised in Table 2. Table 3 summarises the answers to the open-ended questions. 2/20 studies included in the critical appraisal clearly reported the criteria for inclusion and exclusion of participants in the case series. 8/20 studies did not clearly mention any selection criteria. Out of all studies that reported exclusion criteria (9), 4/9 excluded animals with lumbosacral disease other than DLSS (discospondylitis, neoplasia, traumatic injuries) or concurrent orthopaedic problems. One study excluded only participants with concurrent orthopaedic problems. Although 9/20 studies clearly reported exclusion criteria, 5/20 reported inclusion criteria. 6/20 included studies clearly reported consecutive inclusion of participants while in 14/20 it was unclear if the inclusion of participants was made in a consecutive way. 19/20 studies clearly reported age, breed, and sex, of the participants. Weight (28)(29)(30)(31) and level of activity (4,32) were additionally reported in some studies. 13/20 included studies did not clearly report duration and presence of clinical signs among the participants. Measurement of the condition (DLSS) was considered standardised and reliable in 7/20 appraised studies. When considering time of publication, particularly studies from the last 16 years, we can observe that 10/20 had a diagnostic methodology considered standardised and reliable. Table 1 -Key criteria to answer each question of the critical appraisal tool.
The majority of studies (19/20) used valid methods for the identification of the disease. However, one study (33) did not. Radiography was used in 15/20 of studies. Although CT and MRI were used, respectively, in 8/20 and 12/20 included studies, when considering only studies published in the last 16 years, those values are severely altered. In that time period, it was observed that 5/10 and 9/10 of studies used CT and MRI, respectively, as a diagnostic tool. In 17/20 studies, it was considered that outcome assessment or follow-up results of cases were adequately performed. In all included studies, outcomes were measured as presence or absence of postsurgical signs. In 10/20 of these, postsurgical signs were measured by physical examination of the participants after discharge. Out of the studies reporting outcomes as postsurgical signs and in which there was no report of a postdischarge physical examination, 3/6 did not report how outcomes were assessed and 2/6 used owner questionnaires. Owner assessment was performed in 12/20 included studies. In 7/12 of these, a standardised questionnaire was used. However, not all studies using this outcome assessment measure provided a detailed description of the content of the questionnaires. In the remaining studies, the method for owner assessment was not clearly specified. Objective outcome measures, which include diagnostic imaging techniques, Force plate analysis (FPA), or performance in standardised exercises, were used in 7/20 included studies. Out of these, 2/7 used FPA, 5/7 used diagnostic imaging techniques (radiography or CT), and 1/7 used performance in standardised exercises. In 9/20 included studies, the follow-up rate was reported or was possible to calculate from available data at standard periods for at least one outcome measure. The mean follow-up rate in these studies was 93%. Although follow-up was performed in standard times for each study, this did not happen between studies, so the follow-up rate was calculated at the time of the first outcome measurement. In 11/20 included studies, the follow-up rate was reported or was possible to calculate from available data, however, participants had different times of follow-up for all outcome measures. The mean post-discharge follow-up rate in these studies was 94%. The minimum time of outcome assessment between all included studies ranged from immediately post-surgery to 1.5 years. When looking at studies which assessed long-term outcome (more than 1 year) in standard periods (4/20), the mean follow-up rate is as low as 37%. Clear criteria for inclusion and exclusion of participants in a case series are essential for readers to be able to apply them to their patients and help define those who have received the intervention and those who should not receive it (34). In order to be able to compare case series, clinical inclusion and exclusion criteria need accurate and clear reporting (21). Many different terms are used in the literature to report lumbosacral disease, and it involves diverse pathological events (4,7,23,35,36), which further highlights the importance of clear inclusion and exclusion criteria of participants in case series which report surgical treatment of this complex disease. Case series are, due to their non-randomised nature, very prone to different kinds and risks of bias, especially selection bias (37). Selection bias occurs not only when the selection of patients is not random but also if not all patients presenting with a relevant condition are included in order of entry consecutively (38). Therefore, consecutive inclusion of participants is highly relevant in case series. It increases their reliability and credibility, and reduces selection bias (25,34). A full description of participant demographic information is essential to characterise the generalisability of research findings in a study and to make comparisons (25,39). Age, breed, and sex, of the participants were considered to be highly relevant because they provide valuable information on disease predisposition and prognostic factors. In studies on dogs with DLSS these seem to be particularly important due to the apparent predisposition of young adult, male, and large-breed dogs, especially German Shepherd dogs (4,5,7). A high level of activity has been reported as a risk factor for the development of DLSS and may play a role in patient prognosis (1,5,19,(40)(41)(42), which highlights its importance in the reporting of patient demographics in case series. Weight may also play a part in DLSS and on patient prognosis, so it is also relevant to report (19). It is essential that all clinical information of the participants is clearly and fully reported (25,34,43). Lack of this further hampers a full and in-depth characterisation of the disease and jeopardises the investigation of causality between the chronicity of clinical signs and patient prognosis. Duration of clinical signs was considered a criterion for clear reporting because although some studies have found no correlation between duration of clinical signs and patient prognosis (4), some clinical signs of chronic progression have been associated with a negative prognostic value for postsurgical outcome (5,32). Prevalence of clinical signs was also analysed because it is important to fully characterise the disease severity of all participants and the generalisability of the treatment results (38,51). It is important that the method of measurement of the condition is the same for all patients (standard) and produces repeatable and reproducible results (reliable) (25,38,51). Studies from the last 16 years appear to have a more standardised and reliable way of measuring the participants' condition, which may reveal a trend towards a more cautious condition measurement in studies on the surgical treatment of DLSS over time.

Veterinary Evidence
Standardisation and reliability should, however, not be mistaken for study quality. For example, in the study by Slocum & Devine (33), the diagnosis relied on the signalment, physical examination, and radiographic findings of all included animals. Therefore, it was considered that this study was measured in a standard and reliable way. In the study by Suwankong et al. (10), diagnosis relied on signalment, physical examination, radiography, and CT or MRI findings of included animals. Because not all included animals were diagnosed using the same methods, this study was not considered to have a standardised methodology. Now, unlike conventional radiography, CT and MRI have proven to be sensitive and specific detectors of cauda equina compression in dogs (1,2,20,41,52,53). Although in one study (33) the measurement of the condition was performed in a standard and more reliable way, the other (10) used far more accurate and advanced diagnostic imaging techniques, factors that make their quality incommensurable. It is essential that a diagnosis is made based on existing definitions or diagnostic criteria. Outcome assessment validity depends on the use of validated condition measurement tools (25,34). In the study by Slocum & Devine (33) radiography was used alone as a diagnostic imaging tool and surgical confirmation of DLSS was not reported. That may be because distraction-fusion was the employed surgical technique, which may not allow for surgical detection of cauda equina compression. Although normal radiographic findings do not exclude DLSS (53)(54)(55)(56), the majority of studies used radiography as a diagnostic method, which further emphasises that it is commonly used in patients in which there is suspicion of DLSS. This may be because it is a widely available procedure, when compared to CT or MRI, which can aid in the exclusion of diseases other than DLSS that may mimic its clinical signs (20,23,54,(56)(57)(58). The majority of studies from the last 16 years used CT or MRI as a method for identification of the condition. This further supports the premise that these advanced diagnostic imaging techniques have become increasingly available and valuable tools both in the diagnosis of DLSS and in veterinary medicine (2,20,53). In a review article by Jeffery (21), concern was raised about the quality of the reporting of pain evocation methods used in studies on patients with DLSS. Therefore, pain evocation was also analysed in the 20 included studies. It was concluded that 95% of studies reported how lumbosacral pain was elicited, which demonstrates that the reporting quality of that measure is significant. It is extremely important that the full clinical condition of the participants after an intervention or treatment is clearly reported (25,43) in order to characterise their success. So that outcomes of different interventions can be compared and generalisation is possible, reporting of how and when follow-up was performed and measured is equally important. Although some studies which did not report postsurgical complications were classified as having a clear reporting methodology of follow-up results, this feature is also relevant in case series. The diagnosis of DLSS should be based on, among other factors, the physical examination of the animal (1,7,20). Therefore, the postsurgical assessment should also rely on physical examination by an experienced professional. When owner questionnaires are used, detailed reporting of their content is desirable. Although subjective methods such as owner questionnaires and veterinary lameness scores have been validated and are useful outcome measures, they are a source of assessor bias. Therefore, a caregiver placebo effect by both veterinarians and pet owners should be considered when interpreting reports of response to treatment (59,60). However, there are currently no validated and specific owner questionnaires or veterinary lameness scores for outcome assessment of animals treated for DLSS, which poses as a limitation. Objective outcome measures are valuable in improving the understanding of treatment effects in research studies, and provide results that are comparable between studies (21). Although they are valuable tools to assess screw position and condition and bone fusion in distraction-fusion techniques (29,49) significant association between imaging studies and postoperative outcomes has not been identified (42). Furthermore, there is also no apparent correlation between imaging findings and disease severity (2,61).
Loss of patients to follow-up is important because it influences the validity of the treatment or study protocol (27). It was challenging to assess follow-up rates of many studies due to unclear reporting of follow-up at different times. For studies in which follow-up was performed at standard times, the rate was calculated at time of first outcome measurement. Presentation of this data as a mean rate for the first outcome measure alone, and not the entire time of follow-up assessment, results in an overestimation of the follow-up rate, which must be considered when interpreting this data. Conversely, because only some of the studies included in this category used standard periods of outcome measurement alone, this value could also be underestimated for studies which also reported outcomes at different times. The follow-up rate of studies reporting outcome assessment at different times for all outcome measures proved to be higher than the follow-up rate of studies in which outcome assessment was performed at standard periods. However, comparison between these two types of studies is debatable. The minimum time of outcome assessment between studies presented as a wide range of values, which further emphasises the heterogeneity of included studies. Although the hardship for clear reporting of follow-up rates and periods for retrospective studies with a high number of participants is comprehensible, that method assures that comparison between treatment results and generalisation of findings are possible, and that a full interpretation of treatment results is feasible.

LIMITATIONS
This critical appraisal was hampered by several limitations. Firstly, a validated and specific critical appraisal tool for case series was not identified in the literature. In order to overcome this limitation, three critical appraisal checklists and a guideline for case reports were combined, and a critical appraisal tool was designed, in order to conduct this study. Secondly, many different terms are used to describe lumbosacral disease and the several surgical techniques described to treat it, which posed as limitations throughout the study, particularly when establishing a search strategy. An attempt was made to overcome this limitation by using a wide range of search terms in the search process. The authors of the study were familiar with research on the surgical treatment of DLSS, therefore blinding of the screening process was not possible. Furthermore, inclusion and exclusion criteria for studies, as well as criteria for answering the critical appraisal checklist, may have been influenced by previous knowledge of the literature. Finally, although an attempt was made to establish clear and objective criteria for answering the questions proposed in the designed critical appraisal tool, its subjective nature poses as a limitation.

CONCLUSIONS
Validation of a critical appraisal tool for case series would greatly benefit and enrich the critical appraisal of such studies. Although there are several limitations to the execution of randomised controlled trials or other studies representing higher level of evidence to ascertain which treatment option is the best for patients with DLSS, this study has demonstrated that there is room for improvement of the conduct and reporting quality of case series, so that rigorous data can be generated and analysed, to inform research design, guide clinical practice, and improve veterinary healthcare delivery (26). However, we must also consider that even the most robust, prospectively conducted, and clearly reported case series have limitations which make the quality of the evidence they provide incomparable to that provided by well-constructed randomised controlled trials (62).

PROPOSED STUDY CHARACTERISTICS
Reporting guidelines improve the completeness of published scientific reports, so grounding the manuscript of a case series on a validated guideline such as the case report (CARE) guidelines, is advised (26). A case series on the surgical treatment of DLSS should begin with a clear title. The words "case series" should appear on it, along with the intervention of greatest interest so that a precocious identification of the topic and goals of a study can be performed by reviewers and veterinary healthcare professionals (26,43). Although the decision to include keywords on a paper may depend on specific journal requirements, it has been advised. The use of medical subject headings is preferred (12). Clear clinical inclusion and exclusion criteria should be accurately reported, because they allow case series to be compared against one another. One of the main features that limit criteria for participant selection is the retrospective nature of the literature on the topic. The establishment of prospective studies would overcome this limitation (21,60). Consecutive inclusion of participants ought to be conducted and clearly reported because it increases case series reliability and credibility, and reduces selection bias (25,34). Patient demographic information, such as age, breed, sex, level of activity, and weight, should be fully reported, so that predisposition and risk factors may be assessed (25,39). A complete report of the clinical information and diagnostic methodology used in all participants in a case series, mainly the findings of physical, neurologic, and orthopaedic examinations, is advised. In addition to prevalence of clinical signs, duration also seems to be an important measure (5,32,38,51). Standardisation of diagnostic methodologies increases study reliability, therefore it is recommended (25,38,51). The manoeuvre for pain elicitation should equally be described. Advanced diagnostic imaging techniques are becoming increasingly available and are valuable tools to assess cauda equina compression, so their use has been recommended, when available (2,20,21). Description of which criteria the diagnosis was based on helps define if the results of a study are reproducible (25). The complete treatment protocol should be clearly described, so that it can be understood by the reader and replicable (24,43). The outcomes and follow-up results also need to be fully reported and measured, especially how outcome measurement was conducted and which type of tools were used by the assessors. Objective outcome measures, ideally validated, such as FPA and performance in standardised exercises have been advised (21). However, concurrent use of standardised and validated subjective outcome measures such as owner questionnaires and veterinary lameness scores could also prove to be useful, providing that a caregiver placebo effect is accounted for (59).
Blinding of outcome assessment could be used to enrich the conduct of outcome measurement, limiting systematic bias (21). Standardisation of outcome measurement would, similarly to standardisation of