Monday, 29 August 2011

Composite endpoints in RCTs: what are they worth?

Composite endpoints in Phase 3 trials – doncha just love them?  Well I don’t.  As has been pointed out elsewhere, in terms of proving value in an economic evaluation, the first thing I want to do is pick the composite apart because I want to convert each bit into QALYs and savings to understand how that compares against the added cost.

An article has just been published that goes some way to illustrating this point:
‘Weighting components of composite end points in clinical trials: an approach using disability-adjusted life-years’ K.-S. Hong, L. Ali, S. Selco, G. Fonarow and J. Saver  Stroke 2011; 42: 1722-1729.

You need to read the original article but in brief they have focused on vascular endpoints and converted the common components into DALYs left as follows:
7.63 DALYs lost per non-fatal stroke
5.14 DALYs lost per non-fatal MI
11.59 DALYs lost per vascular death
In DALY terms, therefore, if a non-fatal MI = 1 then a non-fatal stroke = 1.48 and vascular death = 2.25.

As a QALY-orientated economist my ideal would have been if they had used QALYs instead of DALYs, but I can understand the DALY disease weightings are more accessible than disutilities in QALY studies.  If you were intending to use these results you also need to understand the different assumptions made – events happen at age 60, US life expectancy data, 3% discount rate, and an assumption good health for older people is worth less than good health for younger people, and so on.  As I said, you have to read the article!

But with those gripes aside I think this is a fantastic illustration of the issue.  I’d like to take it one stage further because Hong and colleagues were thinking as clinicians and trying to produce a measure of health effect alone whereas I would be interested in savings as well.  Supposing I work in a system that is willing to pay £20,000 per QALY, and let’s assume for present purposes DALYs and QALYs are roughly equivalent.  Just to illustrate let’s assume that the lifetime discounted cost of managing events are as follows:
Non-fatal stroke £20,000
Non-fatal MI £4,000 without PCI, £10,000 with PCI
Vascular death £5,000
Then in QALY terms these are worth 1, 0.2 to 0.5, and 0.25 QALYs respectively.

Adding these back in to Hong et al’s figures, we get
Non-fatal stroke = 7.63 (health) + 1 (saving) = 8.63
Non-fatal MI = 5.14 (health) + either 0.2 or 0.5 (saving) = 5.34 to 5.64
Vascular death = 11.59 (health) + 0.25 (saving) = 11.84
Using the higher value of 5.64 for non-fatal MI and setting that to 1, the ratios are 1.53 (non-fatal MI) and 2.1 (vascular death).

I’m a little surprised as my intuition would be that there is a bigger gulf between non-fatal MI on the one hand and non-fatal stroke and vascular death on the other.  I don’t perceive the disability consequence of a non-fatal MI to be any where near that of a stroke.  Vascular death also seems to me a major loss of DALYs or QALYs, losing years of life at 0.6 or 0.7 quality, whereas an MI might be the difference between 70% and 60% over a decade.

But that is to lose sight of the main point of this article which is to put in the public domain something to get this sort of debate started.  Thank you to Hong and colleagues!

Wednesday, 24 August 2011

Utility values for diabetes

The search for consistency (and its desirability)

One of the emerging themes of this blog is the extent to which we can establish standardise aspects of producing HTA evidence and a paper I have just read by Lung et al is an illustration:

Lung TW, Hayes AJ, Hayen A, Farmer A, Clarke PM.
Qual Life Res. 2011 Apr 7. [Epub ahead of print]

This team carried out a literature search so thorough it has me worrying for their psychological stability to identify studies that used one of the QALY-compatible preference measures like EQ-5D or an SF measure, or which used time trade-off or standard gamble in people with diabetes.

They report huge ranges in the values obtained from a humble 14 points for diabetes with no complications (i.e. lowest was 0.74, highest was 0.88) through to 48 points (stroke and end-stage renal disease).  It’s obvious that stroke, for example, would depend on severity and ESRD might depend on whether the person required dialysis and, if they did, whether it was hospital or home based.  However, if an HTA organisation accepts published values from the literature, say because it used its preferred utility elicitation technique, it has handed a substantial element of choice to the people writing the HTA submission.  For ESRD in diabetes will we use a utility value of 0.33 citing source study A or 0.81 citing source study B? 

Using stats techniques I am too dull to understand they then carried out two analyses that I will pick out, a random effects meta-analysis (MA) and a random effects meta regression (MR).

The MA gives a point estimates of the mean utility value across the studies but, as important, it provides a 95% confidence interval (and sample size) ideal for use in sensitivity analysis.  For example, the diabetes with no complications state had values from individual studies from 0.74 to 0.88 but in the MA the mean value was 0.81, 95% confidence interval 0.78 to 0.84.  Fantastic, as a reviewer of HTA submissions, that is so helpful.

In the MR they analyse how much of the variation between the estimates in different studies can be explained by measured features such as the age of the patients, their sex, and the elicitation technique.  Older age and being female led to lower utilities (<<insert joke of your choice>>), and of the elicitation techniques TTO and SG (combined) gave higher values than EQ-5D which, in turn, gave higher values than HUI-3 and SF-6D (combined).    

Tom Lung and team, take a bow, my grateful thanks.  The only other study like this I am aware of is in stroke:

Tengs TO, Lin TH.
Pharmacoeconomics. 2003;21(3):191-200.

Are other people aware of similar studies?  And how should we use them? 

Clearly, the ideal is still that companies measure quality of life in their trials in a way that is compatible with QALYs.  However, in this ‘second best’ situation, I think there is a strong case for making values from these meta-analyses the default settings for an HTA submission.  Of course I would be interested in listening to a company’s arguments for why this should not apply to their particular submission – for example, suppose it could be shown that for a particular treatment the ESRD experienced secondary to diabetes were always of a milder type than a utility value of 0.48 (from Lung et al’s meta-analysis) would imply.

But what we all need to get away from is where an HTA submission can select between two hugely different utility values and cite a supporting reference with equal authority.  This should work for companies as well as it gives them greater certainty when they are estimating the likely cost per QALY at an early stage of a product’s life and it may save them some money on commissioning their own utility surveys. 

Friday, 19 August 2011

Cost per QALY with AND without the PAS

A tip for those submitting to the SMC

When you are including economics results, including sensitivity analysis, your submission includes a patient access scheme (PAS), always quote the cost per QALY with and without the PAS.  It might be self-evident to you that the PAS will be accepted and therefore this is the only relevant version, but until PASAG make their decision we have to have both figures available.  This applies to the submission itself, replying to questions from the SMC and the comment companies make on the draft guidance following NDC.

Thursday, 18 August 2011

Media coverage of NICE

Is a constant source of mystery to me

On Wednesday 17th August, NICE placed its ACD guidance on dabigatran for atrial fibrillation on its website.  So far as I can tell no national media organization covered the story at all.  That may be a good thing – I am not an advocate of NICE bashing (except when they deserve it, obviously).  But how do you explain zero interest, none, nil, zilch, zip?

It could be that atrial fibrillation is in public’s consciousness (I bet it doesn’t feature on many people’s ‘top ten most feared diseases’) but it does lead to stroke and ‘bleed-on-the-brain’ (intracranial haemorrhage) which most people could easily relate to.  NICE vetoes ‘stroke hope’ drug” seems a plausible headline to me.

If we’re talking about obvious ‘human interest’ angles to make this story about numbers seem more interesting there is the fact that the current treatment is warfarin, famously also used as rat poison.  Headline:  NICE saves money by making patients eat rat poison”.

It can’t be that it’s not important: given the number of people with atrial fibrillation the budget impact of this medicine could run into hundreds of millions, money which the NHS doesn’t have to spare at the moment.  Headline: “£2.52 per day ‘unaffordable’ for stroke drug”.

Is every health correspondent on holiday at the same time?  Is it because no patient group has issued a press release?  Or has “NICE fatigue” set in, as newsdesk have their full of riots, phone-hacking, and global economic crisis?

Wednesday, 17 August 2011

Post hoc sub-groups

Are more suspicious when the budget impact is over £200 million

HTA reviewers are quite rightly trained to be suspicious of post hoc sub-group analyses of clinical data.  We all know that if you do enough analyses you will eventually find something passing a test of statistical significance.
NICE published an interesting example on 17th August:
in its ACD guidance on dabigatran in atrial fibrillation.

The RE-LY trial had three arms, a dabigatran 150mg dose, a dabigatran 110mg dose and warfarin.  The EMA licensed what the maker, Bohringer, call a sequence option, of 150mg dabigatran to the age of 80 and 110mg thereafter (with some scope for clinician judgement on bleeding risk in between).

NICE’s assessment obviously has to focus on the licensed sequence but the clinical data underpinning the license did not come from a pre-planned analysis, and this made the Appraisal Committee unhappy; indeed, this is the most substantive criticism in the document in my opinion, and seemingly the basis for a request for re-worked analysis.  

Bohringer’s model of the sequence used data from the trial on people aged under the age of 80 for the efficacy of the 150mg dose and data from the trial on people aged over 80 for the 110mg dose.  The Appraisal Committee said this was not appropriate and asked for the efficacy of the two doses to be based on data from the trial for people of all ages.

Question 1: I don’t know who first asked for this post hoc analysis split at the age of 80 – was it Bohringer or was it the EMA?  If it was the EMA and they thought it was adequate as the basis for a license then does that put the post hoc sub-group analysis in a different light?  Or does the potential budget impact of this medicine in this indication over-ride that?

Question 2: I think I am right in saying that bleeding risk is age-related, therefore the basis for the sequence has clinical plausibility.  But if this is the case, isn’t there a trade-off to be made between what NICE sees as the inappropriateness of post hoc sub-group analysis versus its clinical plausibility and the relevance of the age-specific efficacy data?  It’s not obvious from the ACD that was considered.

Tuesday, 16 August 2011

Mapping to QALYs

Mapping to QALYs: how do I know if that’s good enough?

In HTA, QALYs are the least-bad measure of the value of health outcomes that we have so we have to make them work.  The best way, we think, is by measuring outcomes prospectively in a clinical trial.  When we don’t have that, we reach for a ‘second best’ such as taking values from a previously published study or by using a disease-specific outcome and then estimating a conversion equation to derive quality of life values (utilities) to use as the weights in QALYs.  This is called mapping.
But this poses a challenge to people like me tasked with reviewing the quality of an economic evaluation: what should we be looking out for in mapping studies and what standards should we accept?

Most of the literature I could find on ‘good practice’ focused on the derivation of the relationship between the disease-specific scale and utilities in the first place; less attention has been paid to how these estimates are then used by other researchers.

Useful references were as follows:
A review of studies mapping (or cross walking) non-preference based measures of health to generic preference-based measures.Brazier JE, Yang Y, Tsuchiya A, Rowen DL.  Eur J Health Econ. 2010 Apr;11(2):215-25

‘Do estimates of cost-utility based on the EQ-5D differ from those based on the mapping of utility scores?’ Barton GR, Sach TH, Jenkinson C, Avery AJ, Doherty M, Muir KR.  Health Qual Life Outcomes. 2008 Jul 14;6:51.

In terms of the studies estimating the disease-specific to utility relationship, a reviewer should bear in mind the following:

1. Mapping must be based on data, not opinion
There is general agreement that using the opinion of clinicians to predict how a patient would have answered an EQ-5D questionnaire based on their responses to a disease-specific questionnaire is no longer acceptable.

2. The minimum expectation is linear regression analysis. 
Linear regression analysis is commonly used but because the utility scale is bounded the results may be biased and inconsistent.  Especially if a large proportion of subjects are in full health, better options may be Tobit, censored least absolute deviations or restricted maximum likelihood (Brazier).

3. Different models should be tested,
Some evidence suggests simple methods are best with added complexity having little value.  Other studies have found squared terms and interactions, as well as patient characteristics.  For example, Barton et al developed five models to predict utilities in rheumatoid arthritis:
Model A: total WOMAC score only
Model B: pain, stiffness, functioning (sub-scales of WOMAC)
Model C: total WOMAC and total WOMAC2
Model D: pain, functioning, pain*functioning, pain*stiffness, stiffness*functioning, pain2, stiffness2, functioning2
Model E: best of models A through D plus age and sex of patient
The final model, E, was found to have the best fit.

4. A goodness-of-fit test should be reported. 
Brazier et al’s review found R2 was the most commonly used measure – when mapping a generic to a generic (e.g. SF-36 to EQ-5D) a value of 0.5 can be achieved.  Of the 30 studies they identified one of the lowest R2 values for a disease-specific to generic was 0.17 so if your R2 is at this level you have a problem.

5. The key test is the ability to predict.
The main reason for estimating the quantitative relationship between the disease-specific measure and utilities is to be able to predict.  An essential part of the validation of a model is therefore to test it, either by dividing the original sample and estimating the relationship on one half and testing it on the other, by obtaining a second dataset using the same disease-specific outcome, or a similar test.
Brazier et al propose a measure of prediction error such as Mean Absolute Error should be used; an example is Barton et al who defined it as the average value of the difference between actual and predicted values.  Barton et al briefly review other studies and found an MAE of 0.13 was the lowest observed (where low is good) and 0.19 was the highest observed.  Brazier et al found lower MAEs but it was unclear if these were for disease-specific to generic mapping studies.
Plotting errors against EQ-5D-based utilities can also be helpful; there may be a tendency for utilities to be underestimated for those in better health and over-estimated for those in poorer health.

So that gives me some idea of what to look for in the source study, but what about how it was applied in the submission I am reading? 

My first issue is: what is the hierarchy of sources for utility values?  Presumably we place most faith in direct measurement of an instrument such as EQ-5D in a trial but how does mapping compare with – say – time trade-off (TTO) valuation of descriptions of health states?  The balance would seem to lie with mapping because the disease-specific scale was measured in the trial whereas the TTO values are for descriptions which are then applied retrospectively to how the patients MAY have been feeling, which seems to introduce potential biases.

This raises another interesting question: suppose the study in front of me used TTO (or similar) but I know that a secondary outcome measure could have been mapped to estimate QALYs using an existing study – as a reviewer, should I insist this is carried out?  On the one hand I know mapping studies are imperfect but they are based on patients’ self-assessed health.  The answer is that I would probably request a sensitivity analysis using mapping, but I realise that is a way of ducking!

The second issue is how I would know there WAS an existing mapping study in the first place.  The only published review I could find, by Brazier and colleagues, covered 30 studies up to 2007, but several were establishing relationships between generic instruments and others were unpublished studies; none used cancer-specific scales (the topic I was interested in).  So do I have to conduct a literature search each time I see a utility that isn’t based on the gold standard method?  Should the onus be on the pharma company to establish whether a mapping study is available?  A great solution would be to construct a database of mapping studies.  I’ve made a start; please e-mail me if you are interested.

The third issue is: if there are several mapping studies available, which should I use?  For example, I am interested in the EORTC QLQ-C30 measure and even a brief literature search identified three different studies, with a search of the references identifying as many again?  The QLQ-C30 measure is intended to be used across many different types of cancer, but does this mean the mapping algorithm is equally applicable – if the medicine I am reviewing is for colorectal cancer then is a QLQ-C30 model derived from oesophageal cancer patients and validated on breast cancer patients applicable or not?  Should I be influenced by where the sample of patients for the mapping came from – for example, one of the QLQ-C30 studies was in Greek cancer patients, so should I discount it because I work in Scotland?  The most advice I could find was that patients in the original mapping study should have similar QLQ-C30 scores to the ones receiving the treatment I am interested in, but what does that mean in practice?  With six different mapping studies (at least) I can’t even use my normal trick of running a sensitivity analysis as there seems a good chance at least one of the algorithms will give a different answer to the base case.  Ideally I’d like to pick the one with the most robust statistical method, but I don’t think there is currently guidance to help me do that.

Our approach to mapping seems to be evolving in an ad hoc way.  Some bits of the jigsaw are available but there are a lot of gaps.  I’ve started to piece some of them together but would like to hear from anyone who thinks I’ve got anything wrong or who can fill in the gaps.
Thoughts on a database of mapping studies are very welcome.  (Stop press: I’ve just found another review of published algorithms:
‘Comparing the Incomparable? A Systematic Review of Competing Techniques for Converting Descriptive Measures of Health Status into QALY-Weights’  Duncan Mortimer and Leonie Segal  Med Decis Making 2008; 28; 66.)



My name is Andrew Walker and I am an economist interested in health technology assessment, especially the evaluation of new medicines.  I work at Glasgow University but my main role is as an economics reviewer for the Scottish Medicines Consortium.  THE VIEWS ON THIS BLOG ARE MY OWN and they should not be thought of as SMC policy (the clue is in the title of the blog).  SMC know my views on most things (as if they have a choice!) but my main concern has been that I respect the confidentiality undertaking all committee members and reviewers sign - my comments relate only to general methods issues or guidance published on the SMC website.
My aims are to promote discussion and understanding, and occasionally to entertain. But mainly its a collection of my neuroses and anxieties.  Despite this please feel free to comment, either on the website or by e-mailing me

Charles Rivers: international HTA organisations

The CRA ratings: how did Scotland do?

The consultancy company Charles Rivers Associates recently released a report “A Comparative Analysis of the Role and Impact of Health technology Assessment”.  Commissioned by the European pharmaceutical industry, it rated the HTA systems of 15 countries against 44 criteria using a traffic light system (green = good, red  = bad)[1]. 

First, the good news …
Scotland was rated as ‘green’ (the best rating) on 26 of the criteria, amber on 7, and red on 10 (with 1 deemed not applicable).

Only 2 countries had more green ratings (28 for Australia, 27 for Sweden). 

The following table compares four key criteria based on data for 2009, including 12 medicines used as case studies across each system.

Guidance issued
Typical speed (days)
Yes or no?
(5=all yes, 1=all no)
Cost of agency
€ 1m
€ 7m
€ 11m
€ 4m
New Zealand
€ 3m
€ 23m
Italy (national)
€ 2m

Data sources for the table:
Guidance issued – taken from CRA pages 62-63, Table 18.
Speed – median days from approval by EMA to HTA guidance (approximations taken from bar chart), taken from CRA page 88, Figure 24.
Yes or no? – scoring based on Raftery’s classification where CRA assumed accepted without restriction = 5, accepted with minor restriction = 4, accepted with major restriction = 3, accepted with further evidence required = 2, rejected = 1, taken from CRA page 93, Figure 26.
Cost – taken from CRA page 71, Table 25.
South Korea, Brazil and Turkey excluded due to lack of data.

CRA commented that Scotland was the only country where there was a “relationship between the therapeutic value of the medicine … and the speed of the review, with higher value products progressing more quickly through the review.” (page 88)

Scotland was rated as green on the following criteria, which seem important to public confidence in the system:
  • The rationale for HTA decisions / recommendations is clearly stated
  • The process and rationale for selecting and prioritising topics is clearly defined and publicly available
  • The approach used in HTA is clearly stated and the methods are deemed appropriate by experts
  • A number of relevant stakeholders are invited to contribute to the HTA process and they are involved throughout the HTA process with opportunity for contribution to assessment methodology, submission of evidence, review of recommendations
  • Outcomes are available on a publicly available website and decisions are explained in several levels of clinical/technical detail so that all relevant audiences may understand the decision
  • Length of time taken for reviews

The cost per piece of guidance issued was € 12,195 in Scotland.  The only other country that came close to this figure was Poland at € 45,455.  Other costs were as follows:
€ 142,857 Canada
€ 333,333 Spain
€ 366,667 Sweden
€ 411,765 England
€ 3,833,333 Germany

So on the positive side Scotland has much to commend its system of new medicines evaluation: in terms of transparency of process, involvement of stakeholders, speed and the value for money of the system itself (guidance produced for the budget).

Yes but …
While it is important to disseminate the good news, continuous improvement is not based on self-congratulation, of course.  Why didn’t Scotland do even better, and why did it receive 10 red ratings in particular?

Seeing red
The 10 criteria where CRA rated Scotland as red were as follows:

1. Scientific advice is available to manufacturers during development stage to enable the availability of evidence required for HTA.  SMC was assessed as offering no advice – it is certainly true that SMC has no formal role in this area.  Offering informal advice would have given an amber rating, but it was not clear how informal this advice could be.  For example, SMC posts its guidance for a submission on its website: that sets out the requirements and it is then up to the company how to interpret those.  There may even be dangers in getting involved in decisions about clinical trials if this subsequently restricts criticisms the HTA organisation can make of the submission.

2. and 3. HTA is conducted for pharmaceuticals, devices, procedures, diagnostics and treatment strategies (1 criterion) and proportion of guidance in 2009 that was for non-medicines technologies (1 criterion).  This is not a part of the SMC’s remit, so the low assessment is unavoidable.

4. and 5. HTA is conducted for old as well as new technologies (1 criterion) and proportion of HTAs conducted for old technologies (1 criterion).  Once again, assessing old medicines is not a part of the SMC’s remit, so it gets the lowest assessment.  Taking account of 2. and 3. above, it seems slightly odd that CRA feel an HTA organisation that just does new medicines is inherently lower-rated than an HTA organisations that does everything.  The counter-argument would be that the specialisation of concentrating on one task has advantages.

6. Proportion of 12 case study including information on societal benefits (in guidance).  SMC was rated as providing no evidence in its guidance that factors such as productivity and costs to carers had been considered.  Some evidence was required for an amber rating and evidence in at least 7 of 12 case studies for a green rating.  SMC has stated that its main analysis will focus on NHS costs and savings and health benefits measured in QALYs, but the guidance states other factors are not ruled out and may be considered.

7. Low number of appeals against HTA decisions.  The report was based on calendar year 2009 in which there were no appeals at SMC (called ‘independent reviews of process’).  This is given a red rating.  Amber would have been a few appeals and green some successful appeals.  NICE was rated as green because it has had 58 appeals in 10 years, 23 of which have been successful.  It seems strange that a system that generates no appeals is given a worse rating – would the world have been a better place if SMC had said no to all fringe cases and then reversed some of those decisions on appeal?  The CRA rating ignores the fact that companies can resubmit their case in the SMC system at a time of their choosing after meeting with the SMC to discuss ‘not recommended’ guidance.  Around 50% of resubmissions are accepted.

8. Guidance doesn’t identify gaps in evidence base requiring further research.  SMC do not see this as being part of their remit.  An amber rating involved providing some suggested additional evidence requirements without being specific.

9. Product (medicine) should be accessible / reimbursable before guidance issued.  The report says that in theory medicines can be used before guidance is issued but in practice local decision-makers will restrict use.  SMC aim to produce their guidance very rapidly after the licensing decision so there is no period of ‘planning blight’ where there is no guidance; indeed, where a company delays too long before submitting, interim ‘not recommended’ guidance is issued; companies can submit subsequently.  The speed of the SMC system is widely admired and it seems strange to have a red rating on this criterion.

10. Value of HTA to the health care system should be measured.  Scotland was assessed as having no evidence of measuring the value to the healthcare system.  An amber rating required some evidence and green rating required systematic measurement.  Sweden was awarded a green rating for making estimates of the savings to be realised from its evaluations of therapeutic classes.

Summary.  Of the 10 red ratings, 4 were outside the SMC’s remit.  Three others would have cost/time implications (providing advice before submissions, identifying research gaps, estimating the value to the healthcare system). Two red ratings are highly debatable: it is not clear why having no appeals is taken as a mark of failure, and it is not clear why having medicines available before national guidance is issued is a good thing – both these criteria could be seen as coming from the pharma companies point-of-view.  A further criterion relating to societal costs and benefits is acknowledged by CRA to be ‘controversial’.

Amber gamblers?

In addition to the red ratings the 7 criteria where SMC was rated as amber were as follows:

1. HTA is conducted independently of parties with a vested interest in the outcome – SMC was rated as “sometimes influenced by payers or other parties with a vested interest in the outcome”, whereas a green rating would have been that HTA was conducted independently of payers and other parties with a vested interest in the outcome.  CRA seem to have defined a vested interest purely in terms of the involvement of payers (e.g. NHS boards in England or PCTs in England).  This ignores other partners at the table, such as the ABPI which has three seats on SMC, and goes against the preferred model of partnership working.

2. HTA considers unpublished trial data – SMC was rated as considering unpublished data in limited circumstances, whereas a green rating would have been “routine consideration of unpublished data where appropriate”.  This seems to have been based on CRA’s interpretation of a comment in SMC’s guidance to submitting companies.  The CRA does not report this but the gist seems to be that SMC said that if a published paper and an unpublished paper are available reporting the same thing then the published version will be preferred.  It’s hard to see why the SMC is wrong about this: surely a paper published in a peer-reviewed journal is preferable, other things being equal?  However, SMC has a long record of considering unpublished data, including some that could not be published in its final guidance so the amber rating seems harsh.

3. HTA takes into account cost on the public purse; non-healthcare and indirect costs and benefits to patients and society.  This rating is based on non-healthcare factors being allowed in guidance to companies whereas the green rating was given when submissions were required to have this information.  As stated above, this is not currently part of SMC’s remit.

4. Proportion of assessments which are re-evaluations.  This relates to whether there would be a re-evaluation when there is new evidence (not a resubmission following an initial ‘not recommended’ decision).  An amber rating implies a few re-evaluations were carried out whereas a green rating would require routine re-evaluation.  Pharma companies can resubmit their case whenever new evidence becomes available on a product, s it is not clear Scottish patients are losing out.

5. Relationship between HTA and reimbursement restrictions.  Scotland was rated as having a system where this relationship was ‘somewhat’ defined; a green rating would have been a formal and clear definition.  CRA looked at decisions by three Scottish NHS boards for the 12 case study medicines and found some had not made a decision about whether to accept the medicine locally, several months after SMC guidance had been issued and in two cases local NHS boards had decided not to recommend a medicine accepted by SMC.

6. Impact on diffusion.  SMC rated amber because there was said to be evidence of negative impact of HTA on medicines diffusion in some cases.  A green rating would have been no evidence of a negative impact. The basis seems to be the claim that Scotland has lower survival rates than other countries – while it was not clear this seems to have been linked to other evidence that Scotland says no more often to medicines or is more restrictive when it says yes.  At best, this point seems controversial.

7. Explicit treatment of innovation.  This criterion had three tests: early dialogue with companies, transparent assessment process, and explicit consideration of innovation.  SMC was assessed as achieving 2 of these.  A green rating required all three be achieved.

Summary.  Of the 7 amber ratings, SMC could take issue with 3 (sometimes being pressured by vested interests, no successful appeals, not routinely considering unpublished evidence).  The evidence base for there being a negative impact of SMC decisions on diffusion was not available and is questionable.  The link to re-imbursement decisions is a matter for the Scottish Government Health Department.  SMC has decided not to hold early dialogue with companies about submissions.

There is much that is positive for SMC and Scotland in the CRA Report.  The Scottish system gives similar decisions to other agencies but does so at least as quickly (and far quicker than some) and much more cheaply.  The CRA report highlights several aspects of good practice in the SMC’s way of working.
Where the ratings are not so good, Scotland needs to understand why.  There are some ratings that are highly debatable, some that apply at a Scotland level outside of SMC’s remit, and others that are outside of SMC’s agreed pattern of working.
The report is designed to be the basis for a repeat exercise in future years. CRA and its sponsors are to be congratulated on an excellent starting report, full of useful information and comparisons.  Some of the criteria (and the way they were applied) could be usefully debated before the second version is commissioned.