Objective: Stability of diagnosis is one measure of predictive validity for psychiatric syndromes. It is an under-studied area despite its clinical and research implications. This report aimed to critically review the literature concerning diagnostic stability in functional psychosis.

Methods: Articles concerned with evaluating the diagnostic stability of functional psychosis and factors associated with diagnostic change were reviewed.

Results: Despite methodological variation, schizophrenia was found to be the most stable diagnosis followed by affective psychosis. Other psychotic disorders were diagnostically unstable over time. Around one-fifth of patients with first-onset psychosis had their diagnoses revised at follow-up. Diagnostic change occurred early in the course of the psychotic illness. The major pattern of diagnostic shift was towards schizophrenia spectrum disorders, particularly schizophrenia. Few variables were identified as predictors of such diagnostic conversion and the evidence established thus far is inconclusive.

Conclusions: The present analysis indicates that diagnostic uncertainty and temporal instability is common in the early phase of psychosis especially in less prevalent diagnostic categories. It also highlights the limitations of the contemporary nosological classification in functional psychosis. In the absence of biological markers, a diagnostic process taking into account longitudinal observations across consecutive episodes should be a major requirement for making a definitive diagnosis.

Key words: Diagnosis, differential; Early diagnosis; Follow-up studies; Psychotic disorders


目的:診斷穩定性可量度精神病綜合症的預測效度。雖然有著臨床應用和研究的重要性,診斷 穩定性這範疇卻很少被探討。本文回顧有關功能性精神病診斷穩定性的文獻。


結果:撇除方法學上的差異,診斷穩定性最高的仍是精神分裂症,其次為情感性精神病。至於 其他精神病,診斷會隨著時間改變。約有五份之一的首發精神病患者在隨訪時會有診斷上的改 變。而診斷的改變只會在精神病的初期進化過程中出現。診斷改變主要朝向精神分裂症譜系障 礙,尤其是精神分裂症。很少能夠確認診斷改變的預測因素,而可見的實證亦缺乏結論性。

結論:本分析顯示早期精神病普遍會出現不確定和隨時間不穩定的診斷,尤其發生在一些較少 見的精神病中。本分析亦顯示功能性精神病現代等級分類的不足。在缺乏生物學指標的情況下 作出肯定的診斷,必須要對連續發作的精神病作一個長期的觀察研究。



Diagnosis is regarded as a sine qua non for clinical practice and research.1 It provides information about patients’ symptomprofiles, prognosis, treatmentoutcomesandsetsthe boundaries for research through delineating homogeneous patient groups.2 Unlike other branches in medicine where there is better understanding of the underlying biological processes, in psychiatry the diagnoses are still based on identification of clinical syndromes. The introduction of explicit operational criteria and rule-based classifications significantly improved diagnostic agreement.3 Nevertheless, adequate diagnostic reliability does not necessarily provide information about the construct of disorders.4 Owing to the lack of objective measurements for making definitive diagnoses, operationalised diagnoses should therefore be regarded as provisional and the validity of the diagnoses of psychotic disorders incorporated by the contemporary classifications cannot be taken for granted.3,5

It is stated that a valid diagnostic category should be defined by more fundamental characteristics such as physiological, pathological, or genetic abnormalities.3 In the absence of clinicopathological correlates, it is difficult to verify psychiatric syndromes.6 Outcome has been regarded as the most important and the most widely applicable criterion of validity in the context of clinical psychiatry.7 Stability of diagnosis over time, being an outcome measure, has been postulated as one criterion for diagnostic validity8,9 as it is the measure of the degree to which a diagnosis remains the same at subsequent evaluations.10 It is assumed that the more stable the diagnosis, the more likely it is to reflect a basic and consistent psychopathological or pathophysiological process and is hence more valid.11

Diagnostic revision can be attributable to a change in the clinical picture and methodological artifacts such as information variance, unreliable assessment, inconsistent application of diagnostic criteria, and low inter-rater reliability.12 In most epidemiological studies, a subject’s lifetime diagnosis for longitudinal outcome analysis is usually based on the cross-sectional diagnosis derived from a baseline assessment.13 Yet it is known that any given patient’s diagnosis can change over time.14,15 Longitudinal diagnostic instability thus raises concerns regarding the validity of research into aetiology, genetics, prognosis, and treatment efficacy.16 Clinically, diagnostic misclassification can lead to iatrogenic effects through inappropriate treatment recommendations.17

Given the clinical and research significance of diagnostic instability, along with its relevance to the nosological framework of functional psychosis, we performed a systematic review of the literature with an aim to evaluate: (1) the diagnostic stability of functional psychotic disorders; (2) patterns of diagnostic shift among various categories of functional psychosis; and (3) factors associated with diagnostic instability. We will also discuss methodological variations in current research on diagnostic stability and suggest strategies for future research.


A literature search was conducted using the Medline computerised database to identify relevant English-language articles published from 1 January 1980 to 31 October 2008. Keywords used as search terms included: diagnostic stability, temporal stability, diagnostic consistency, AND psychotic disorder, functional psychosis, first-episode psychosis. Citations within identified papers were included as additional sources. Earlier studies conducted before the introduction of operational diagnostic criteria were excluded from the review as the results of these studies were confounded by inconsistent diagnostic formulation with low reliability.11 Studies that only focused on child and adolescent samples were also excluded.18-21 We also excluded two other studies which measured diagnostic agreement between those assigned in an emergency setting and structured interview-derived diagnoses22 or inpatient discharge diagnoses.23 These studies evaluated the reliability of diagnostic procedures conducted in emergency settings rather than diagnostic stability per se.

A total of 35 publications were selected for review. Tables 1 and 2 summarise their key features.


Methodological Considerations

Methodological heterogeneity was observed with respect to subject collection, diagnostic scope, sample size, diagnostic assignment, and time intervals studied (Table 3). Sample selection and diagnostic ascertainment are the two fundamental issues affecting evaluation of the results of diagnostic stability. Methodologies affecting these two aspects varied widely and are discussed in detail below.

Sample Selection

Re-admission Versus First-episode Sample

Many studies recruited subjects from re-admission samples, making such studies limited by the bias inherent in sampling rehospitalised patients. These patients were much more likely to have established chronic illnesses, causing an overestimation of diagnostic stability.24 It is postulated that the first few years following the onset of psychosis are the critical period in illness evolution.25 Studying first-episode cohorts allows researchers to capture the true diversity in the course of functional psychosis from the onset. It also ensures relative homogeneity within the sample with respect to illness chronicity and treatment exposure.26 Nevertheless, it should be noted that a “first-admission” sample is not synonymous with a “first-episode” sample. The former excludes subjects with less severe first psychotic episodes that did not require hospitalisation. Using a cohort of first presentation for treatment, i.e. “first contact to treatment” sample is less biased and thus more representative of patients with first-onset psychosis.27

Diagnostic Scope

Some studies only focused on non-affective psychosis.28 As there is significant overlap between affective and psychotic symptoms in the early phase of illness,11 exclusion of affective psychosis can introduce bias by automatically removing from the study analysing those who might have their baseline diagnosis of affective disorder reclassified at follow-up. Another sampling bias arises from the exclusion of patients with comorbid substance abuse. Concomitant substance use is frequently observed among patients presenting for treatment with psychosis.29 It has been reported that approximately one-third of those with first-episode psychosis had a co-existing substance use disorder.30 Because of the pervasiveness of substance abuse problems in patients with psychosis, the generalisability of findings from studies using this exclusion criterion would be severely compromised.

Diagnostic Ascertainment

Diagnostic Criteria

Various diagnostic schemes have been used in studies investigating diagnostic stability but there is little consensus on the validity of any particular diagnostic system over the others and a lack of objective indicators for definitive diagnoses.31 Different diagnostic criteria denoting the same disorder will certainly identify patient groups with overlapping but non-identical characteristics. These discrepancies in diagnostic definitions need to be considered when interpreting the findings.

Methods of Diagnostic Assignment

Misclassification of diagnoses could also be attributed to diagnostic evaluation procedures per se, i.e. procedural validity.31 Case registers, structured interviews, and medical record reviews were methods applied either alone or in combination for diagnostic assignment. Studies deriving diagnoses from case registers were limited by the variation in information and variability in application of diagnostic criteria.12 Retrospective application of diagnostic criteria to case notes was adopted in numerous studies and was shown to be reliable for diagnostic ascertainment.24 Nonetheless, the validity of results is limited by the documentary quality of medical notes. Structured interviews provide a systematic way of formulating differential diagnoses. However, denial of symptoms and recall bias are common in patients with psychosis and affective disorders, so relying on structured interviews to make a diagnosis or verify previous episodes without incorporating other sources of information impairs the reliability and validity of generating a longitudinal diagnosis.32,33 Although there is no gold standard method for formulating psychotic disorder diagnoses, employing “best- estimate” consensus diagnostic procedures by using all available information from multiple sources incorporated with longitudinal assessments has been considered a reliable and is the currently accepted standard method for diagnostic assignment.34

Measures of Diagnostic Stability

Diagnostic stability was most commonly measured and presented as prospective and retrospective consistencies. Prospective consistency is defined as the proportion of subjects in a category at baseline assessment who retained the same diagnosis at the end of the follow-up period. Retrospective consistency is the proportion of subjects whose diagnosis at the end of follow-up is the same as that made at the baseline assessment.13 Diagnostic stability is considered present if the information within the follow-up period confirms the original baseline diagnosis, irrespective of whether the symptoms of the original diagnosis were actively present during follow-up assessments.12

Review of Study Findings

Non–first Episode Psychosis Study


Most of these studies selected patients from re-admission samples.10,14,15,32,35-43 One recent study included subjects from outpatient and emergency settings apart from rehospitalised patients.16 A study initiated by the World Health Organization recruited a mixture of both prevalence and incidence samples from 4 independent cohorts with different inclusion criteria for analysis.44 There was a wide variation in the follow-up period ranging from 40 years to 2 years with the majority using a follow-up period of less than 5 years.

Most studies were retrospective in design, using case registers as the only source of information for diagnostic ascertainment.10,15,16,35,36,38,40-42 Forrester et al43 applied the operational criteria checklist to case notes of re-admitted subjects to generate diagnoses. Other researchers utilised a combinationofeithercaseregister9 ormedicalnotereviews14,32,37 and follow-up interviews for diagnostic assignment. The baseline diagnosis was determined retrospectively while the final diagnosis was formulated using information obtained from both interviews and medical records. Only one follow- up interview was conducted, however, and the time interval between the initial episode and the follow-up assessment ranged from 12.5 years to 40 years.

Findings of Diagnostic Stability

Schizophrenia had the highest diagnostic stability (mostly above 70%), followed by affective disorder (mostly below 70%).15,32,35-42 Many studies did not differentiate affective disorder into bipolar affective disorder and depressive disorder. Most studies did not discriminate between psychotic and non-psychotic affective disorders. Relatively few studies examined the diagnostic stability of other psychotic disorders such as delusional disorder, schizoaffective disorder and acute psychoses, which have been reported to be diagnostically unstable over time.10,14,40-43,45

Findings of Diagnostic Shift

Few studies assessed the pattern of diagnostic change. Conflicting results regarding the diagnostic shift between schizophrenia and bipolar affective disorder were noted. Some studies demonstrated a minimal diagnostic switch10,32 while others found considerable conversions between schizophrenia, bipolar affective disorder, and schizoaffective disorder.14,15,37 Findings regarding factors associated with diagnostic change were also inconsistent. Some researchers revealed no variables associated with diagnostic shift apart from diagnostic group memberships.42 Others reported that gender, ethnicity,15 age and hospital changes10 were associated with a shift in the schizophrenia diagnosis. Diagnostic consistency also correlated with treatment settings; a diagnosis made for an inpatient was more stable than that derived in outpatient and emergency settings.16

First-episode Psychosis Study


Most of these studies examined a wider spectrum of psychotic disorders,11-13,17,28,46-54 though some reports focused solely on single diagnostic categories such as schizophrenia,6 schizophreniform disorder55 and acute and transient psychotic disorders (ATPD).56-58 Affective psychosis28 and comorbid substance abuse11,53 were excluded by some studies.

Most studies had short follow-up intervals, with the majority following up for less than 2 years (6 months - 13 years). Three studies had follow-up durations of 5 years or above6,46,47 but one such study assessed schizophrenia only6 and another one was limited by its small sample size (n = 85).47

Diagnostic and Statistical Manual of Mental Disorders, 4th edition (DSM-IV)59 was the most common diagnostic criteria adopted by these studies, followed by DSM-IIIR60 and International Classification of Disease–10 (ICD-10).61 Most studies recruited a “first contact to treatment” sample rather than a “first-admission” sample.6,11,17,28,47-51,53-54,57 The majority used a prospective design, 6,11-13,28,46-53,55-57 however, many had only one follow-up assessment conducted after a long time interval from the onset diagnostic evaluation. Some studies ascertained the diagnosis via structured interviews without a case notes review covering the interval between the 2 assessment time points.28,53-55 Information on the subject’s diagnostic status and clinical changes within this interval might be missed and thereby misjudged as diagnostic stability.12

Some studies relied on a single rater to make either the initial or final diagnosis or even the same researcher to determine both diagnoses in individual subjects.6,28,46,51 Even though a more reliable consensus procedure was adopted, large variation still remained. Very few studies conducted independent diagnostic assignments by at least 2 psychiatrists blinded to the facility diagnosis.12,13,52 Some studies involved 2 diagnosticians for the consensus procedure but one of them was responsible for compiling all the necessary information and presented the data to the other to generate the consensus diagnosis.11,47-50 In this context, bias might be introduced when formulating the consensus diagnosis as it is likely to depend on the judgement of the diagnostician who prepared the clinical information.

Findings of Diagnostic Stability

Overall diagnostic consistency was around 70%. Schizophrenia was the most stable initial diagnosis (mostly above 90%), followed by bipolar affective disorder (mostly above 80%) and depressive disorder (about 70%). Inconsistent results were observed in delusional disorder (33-100%) and acute and brief psychoses (27-100%).11-13,17,28,46-54,56-58 Diagnostic categories like schizophreniform disorder, schizoaffective disorder and unspecified psychosis were the least stable. For patients initially diagnosed as having schizophreniform disorder, 50 to 100% had their diagnoses switched to schizophrenia or rarely schizoaffective disorder later on.17,28,49,51,55 With few exceptions, schizoaffective disorder was found to have low temporal stability (below 40%) with subsequent transition mainly to schizophrenia and bipolar affective disorder at follow-up.11,13,46,48

Findings of Diagnostic Shift

The diagnostic category receiving the largest influx of cases at follow-up was schizophrenia spectrum disorder, with most switching to schizophrenia.13,17,28,49,51,53 The category with the most frequent diagnostic transition to the latter was schizophreniform disorder. There were more shifts from affective disorder to schizophrenia spectrum than vice versa. Substantial diagnostic movement to affective disorder and schizophrenia from acute psychosis was also demonstrated by certain studies.48,56,57 Few studies examined predictors of diagnostic instability and results thus far are inconclusive. A longer duration of untreated psychosis (DUP) and poorer premorbid adjustment were found to be predictive of a diagnostic shift to schizophrenia spectrum or schizophrenia.13,17,28,51 Conflicting results regarding comorbid substance abuse and baseline symptom severity as predictors of diagnostic change towards schizophrenia spectrum were observed.13,17,51


Despite the wide disparity in methodological design, relatively uniform results have been observed across studies, especially those recruiting first-episode cohorts. The overall diagnostic consistency for first-onset psychosis was around 70%. Both schizophrenia and bipolar affective disorder displayed high levels of diagnostic stability, supporting the distinct nature of these disorders.62 Other psychotic disorders including schizoaffective disorder, schizophreniform disorder, acute and brief psychoses, delusional disorder and unspecified psychosis were found to be diagnostically unstable over time with a prospective consistency usually below 50%. A diagnosis of schizophrenia showed a relatively lower retrospective consistency compared with its prospective consistency, indicating that more subjects changed their diagnoses towards rather than away from schizophrenia. Schizophrenia was also shown to have a high level of specificity; patients diagnosed as not having schizophrenia at follow-up rarely received a schizophrenia diagnosis at baseline.13,17,28,48,50

Diagnostic change in functional psychosis was found to be a relatively early phenomenon. Diagnostic revision tended to occur within the first few years after the onset of psychotic illness.14,17,44,46,51 Around 30% of patients with a first psychotic episode were re-diagnosed at follow-up and a diagnostic shift to schizophrenia spectrum occurred most frequently. Within this group the majority changed to a diagnosis of schizophrenia.13,17,28,44,48,51,53

Contrary to studies using re-admission samples which demonstrated that patients presenting with manic episodes at a younger age of onset were frequently misdiagnosed as schizophrenia,10 results of first-episode studies suggested that this misclassification bias was no longer a major reason for diagnostic inaccuracy in bipolar affective disorder.13,63 Nonetheless, a high frequency of mood-incongruent or Schneiderian first-rank psychotic symptoms in bipolar affective disorder, particularly in first-episode mania might be attributable to diagnostic ambiguity in the early phase of a psychotic illness.64,65

Virtually all studies reported that schizophreniform disorder was diagnostically unstable and the majority with this baseline diagnosis switched to schizophrenia over time.17,28,48,49,51,55 It was therefore suggested that such high temporal instability was in part due to an arbitrary separation between these 2 diagnoses and the subsequent conversion merely reflected the natural illness evolution rather than diagnostic change per se.48,51

Acute and brief psychoses were consistently shown to be diagnostically unstable and frequently changed to schizophrenia and affective disorder at follow-up.45,56-58 Some researchers suggested that subdividing ICD-10 acute polymorphic psychotic disorder into the groups with and without schizophrenic symptoms might be unwarranted since such differentiation has no bearing on outcome and diagnostic shift.66 As well, being a subcategory under the rubric of ATPD, acute schizophrenia-like psychotic disorder was found to have low diagnostic stability and a large proportion of subjects with this diagnosis at baseline changed to schizophrenia later on.44,66 On the other hand, the temporal instability of unspecified psychosis was to a large extent expected due to its non-specific nature inherent in the diagnostic definition. Most of these patients were reclassified as schizophrenia at follow-up.13,17,28 Overall, the pattern of diagnostic shift echoed the concept of differentiation, which hypothesises that an initial atypical and non-specific clinical picture of functional psychosis might become clearer over time and evolve into prototypical categories such as schizophrenia and bipolar affective disorder.67

There was a lack of research into the factors associated with a change in diagnosis, in particular the diagnostic conversion towards schizophrenia spectrum. Owing to the paucity of relevant data, correlates of the diagnostic shift such as DUP and poorer premorbid adjustment were far from conclusive. Replication in future studies is required to confirm their predictive value.

With intensive and comprehensive assessments provided by a specialist team, schizophrenia and bipolar affective disorder can be reliably diagnosed in patients presenting with a first episode of psychosis. The findings of temporal instability in less common diagnostic entities, such as acute and brief psychoses and unspecified psychotic disorders, highlight the greater phenomenological fluidity in the early phase of psychotic illness and the difficulty with ascertaining an accurate diagnosis at the initial assessment. Since provision of particular treatment modalities is partly dependent on specific diagnostic categories, misdiagnosis may therefore expose patients and their families to inappropriate treatment and adverse psychological impacts.17 Additionally, information and education about their diagnosis has been shown to improve patients’ treatment adherence, illness outcomes, and their sense of well-being.68 As predictors of a diagnostic shift towards the more severe form of functional psychosis, i.e. schizophrenia, are yet to be established, it is therefore recommended that patients suffering from early psychosis should be kept under close scrutiny with thorough assessment and regular diagnostic reviews to minimise misclassification.

Diagnostic instability has implications for research. Schizophrenia is considered a lifetime diagnosis15 but previous studies have consistently demonstrated a diagnostic flux towards schizophrenia. A proportion of patients with this final longitudinal diagnosis were misclassified as having other psychotic disorders during the intake assessment. This underscorestheneedtorecruitabroadspectrumofpatientswith functional psychosis instead of restricting the sample to those subjects who fulfill the diagnostic criteria for schizophrenia at baseline. Otherwise, a significant proportion of patients with a lifetime schizophrenia diagnosis, i.e. false-negative cases, will be missed at study entry and the validity of the research findings will be undermined by this misclassification bias.26,27

Several studies found that use of the ICD-10 definition for schizophrenia had comparable diagnostic stability, specificity and predictive validity to but higher sensitivity than the DSM-IIIR / DSM-IV schizophrenia diagnosis.6,48 Some researchers stated that the DSM-IV definition with its 6-month duration criterion was overly restrictive and led to underdiagnosis of schizophrenia in the first-episode sample.13,49 The DSM-IV criteria for schizophrenia has also been criticised for only identifying a subgroup of patients with a more chronic illness and poorer prognosis.69 Thus it is suggested that the broader concept of schizophrenia as defined by the ICD-10 criteria might represent a clinically more useful definition for first-episode psychosis studies.6,48

From a nosological point of view, heterogeneity within and a lack of clear boundaries between certain psychotic disorders, as evidenced by frequent diagnostic shifts between them, reflected the insufficiency of current taxonomy for classifying functional psychosis,16 such as ATPD and schizoaffective disorder. The subdivision of ATPD into 6 categories has been criticisedbysomeresearchers70,71 andrecognisedbyICD-1061 as lacking empirical evidence. Simplifying this subclassification on the basis of the presence of polymorphic symptoms and reassigning acute schizophrenia-like psychotic disorder into schizophrenia spectrum might improve its applicability and validity.44,66,72 As well, the diagnostic instability, low inter- rater and clinical reliability of schizoaffective disorder73,74 indicates that the present diagnostic criteria are insufficient for defining this group of patients.75 It has been demonstrated that schizoaffective disorder is polymorphous in nature, showing frequent syndrome shifts among schizoaffective, pure mood and pure schizophrenic episodes along the illness course.14,37 Focusing on cross-sectional presentation, as adopted in current diagnostic systems, without taking into account the chronological perspective, makes it difficult to assign a diagnosis of schizoaffective disorder. Delineating the disorder into concurrent and sequential subtypes by incorporating the longitudinal course might help resolve the diagnostic dilemma and difficulty differentiating between subgroups of patients presenting with overlapping affective and schizophrenic symptoms both cross-sectionally and longitudinally.75

The main limitation of using temporal stability as an indicator for validating psychiatric diagnoses is that it is based on an implicit assumption of the existence of discrete entities associated with relatively unique clinical syndromes reflecting underlying biological dysfunction.3,76 As the reigning nosological paradigm in clinical and research psychiatry, the categorical approach has been criticised for the lack of evidence supporting its assumption of distinct entities in functional psychosis.3,44,76 Owing to the failure to identify zones of rarity77 and the presence of overlapping psychopathology,4 genetic susceptibility78 and neuroimaging findings79 in schizophrenia and bipolar affective disorder, a dimensional or continuum model has been proposed as a substitute for categorical classification across the psychotic spectrum.80 On the other hand, it has been recognised that discrete diagnostic entities and continuous variables are not mutually exclusive means of conceptualising psychiatric disorders.81 Depending on the focus of research questions, the choice and the use of a combination of categorical and dimensional representations may offer additional advantages and serve as complementary means of hypothesis testing.82 It is suggested that an accurate delineation of syndromes by minimising clinical heterogeneity, though not a prerequisite, paves the way for, and increases the likelihood of, aetiological discovery.7,82 In this context, although diagnostic stability is not the ultimate criterion for validity, in combination with other parameters, such as course of illness, treatment outcome,83 genetics and neurobiological deficits,84 it might assist with the establishment of more distinctive diagnostic categories in functional psychosis and facilitate investigation of underlying pathophysiological abnormalities.

Contemporary diagnostic systems have been shown to be relatively consistent for reconfirming the prototypical diagnostic entities, i.e. schizophrenia and bipolar affective disorder.13,15 Nevertheless, the diagnostic stability and patterns of diagnostic shift in less common psychotic disorders such as ATPD and delusional disorder are far from clear. Future studies focusing on these specific diagnostic subgroups with larger sample sizes will better elucidate their illness trajectories and the boundaries between the major psychotic disorders, schizophrenia and affective disorder. More research is also needed for evaluation of the longitudinal course of substance-induced psychosis and its diagnostic change to functional psychosis as the distinction between these two conditions has important implications in treatment strategies.29 Studies of this nature may also shed light on the potential aetiological links and neurobiological substrates of functional psychosis.85


