Skip to content

Cool graph in 5 seconds

Cool graph in 5 seconds. Open R. Type


library(rgl)

with(mtcars, plot3d(wt, disp, mpg, col="red", size=3))

Spin the graph with your mouse. Cool!

(Hat-tip to the mighty Quick-R from whom I stole the code)

Privacy and DNA

Reblogged from Stats in the Wild:

A Harvard professor has re-identified the names of more than 40% of a sample of anonymous participants in a high-profile DNA study, highlighting the dangers that ever greater amounts of personal data available in the Internet era could unravel personal secrets.

Read the whole story here.  Also, why not check out how unique your birthdate, zip code, and gender are by going…

Read more… 17 more words

Interesting story, but it's worth noting that participants were identified from postcode, date of birth, and gender, whereas a useful open dataset would include only region, age, and gender. So I'm a 34 year old man from Nottingham. If that's all you had, lots of luck identifying me.
Image

What does your maternity leave look like?

What does your maternity leave look like?

Now THIS is a visualisation- the same child photographed at ages corresponding to statutory paid maternity leave in different countries

All Information is Actionable

Reblogged from Stats With Cats Blog:

Click to visit the original post

  • Click to visit the original post
  • Click to visit the original post

Business managers occasionally complain that information they are presented with isn’t actionable. This usually pisses off the data analysts who spent a great deal of effort acquiring data and turning them into information.

Data analysts will tell you that actionable information is:

  • Trusted — Based on credible data, generated in a way or obtained from a source that provided adequate quality checks.

Read more… 571 more words

A brilliant post on the conflict between the needs of analysts and decision makers... with cats! Genius post on a genius blog

Data in the NHS

I wrote a little piece for an information pack related to the Patient Feedback Challenge so I thought I may as well share it here. I’m a little rushed so I won’t hyperlink the references, apologies for this they are all at the bottom in plain text.

NHS data systems are, quite naturally, built around very strict data management systems which emphasise the security of data and ensuring accountability of data loss or misuse. This is right and proper. However, very frequently, clinicians, managers, auditors and evaluators do not need to access highly sensitive personal information. In certain cases, information can be shared freely over the internet, as with the data collected on patient experience within the Patient Feedback Challenge (survey responses, Patient Opinion stories, and specific extracts from the PALS database).

The skills and technologies to rapidly analyse and disseminate data of this kind where rights access is not an issue is presently lacking in many NHS settings. Data analytic solutions, where they exist, are often large and complex and are complex and expensive to set up and modify. Where solutions are brought in from private sector providers these tend to be inflexible and it can be difficult for an organisation to achieve the results they want using off the shelf technologies (for example, integrating and resharing different types of data, such as the survey, Patient Opinion, and PALS data involved within the Patient Feedback Challenge).

A further limitation in the data architecture of many NHS organisations is the tendency for data to exist in silos, with each department maintaining its own separate database with no capacity to combine or reshare any of the data. Often the existence and location of datasets is unknown to others working elsewhere in a Trust.

Another problem endemic in many NHS settings is Spreadsheet addiction (e.g. Burns, 2004). This has been given as a possible culprit in JP Morgan Chase’s having lost 2 billion dollars in May 2012 (from http://goo.gl/SSAMS):

‘James Kwak, associate professor at the University of Connecticut School of Law… noted some interesting facts in JP Morgan Chase’s post-mortem investigation of the losses. Specifically, that the Value at Risk (VaR) model that underpinned the hedging strategy:

“operated through a series of Excel spreadsheets, which had to be completed manually, by a process of copying and pasting data from one spreadsheet to another”, and “that it should be automated” but never was.’

The process of analysing and presenting Service User and Carer Experience survey data began in a similar fashion, with spreadsheets and macros being used to produce summary graphics, and reports manually cut and pasted together. Fortunately for us, before we lost 2 billion dollars this way there was a mishap with the process and it became obvious that the process needed to go from “should be automated” to “has been automated” in order to meet the reporting timescales. An incremental approach to automation was adopted, and it’s worth noting that three years ago the lead analyst on the survey had no programming experience whatsoever. Each quarter the survey reporting would demand another layer of complexity and the analyst would learn out of books and by accessing helpful online communities how to perform whatever new task was demanded. As with most technological movements, automation did not bring a reduction in workload but rather more sophisticated and better results, and within a few years reporting on survey data had gone from a task that was difficult for one person to perform quarterly alongside other tasks to a level of complexity that would demand more than one individual working full time for the whole of each quarter just to meet each quarterly timescale.

The patient feedback challenge has allowed us (with our friends at Numiko) to clear the final hurdle and to remove the last vestiges of human effort from the regular reporting, which means that quarterly survey reports and custom extracts will be available 24 hours a day, 365 days a year to anyone in the world with an internet connection. Managers will be able to see the latest results from their area and anywhere else in the Trust at any point in the reporting cycle and service users and carers will be able to transparently query and scrutinise all of the patient feedback data that we collect.

Two principles have guided the work on data architecture and processing. Firstly, all the technologies that were used are simple, free, and open source. Using languages such as R and Python and reporting technologies such as HTML and LaTeX means that not only are there no costs associated with buying or licensing software but also that analysis and reporting technologies are simple to use straight away. In a process known as agile software development (Beck et al., 2001) all of the technologies that have been produced have been put to immediate use. Early on, automation was only possible of graphics and manual effort was required to put together the report and write textual summaries. Later graphics and text were both automated and only formatting and final edits were performed by a human being. With the completion of the Patient Feedback Challenge now the formatting can also be automated. But at each stage the results of the technology were being used live in order to meet the demands of the organisation. This has the twin benefit of giving an immediate payoff to investment in workforce costs developing the software as well as giving the opportunity for the technology to be refined in use. The results of each stage of development were obvious when each reporting cycle was complete and work has been undertaken throughout to better serve both the Involvement team who deliver the survey results as well as the clinicians, managers, service users, and carers who make use of them.

The second principle is that all of the work which has taken place within the survey analysis and reporting and the Patient Feedback Challenge has left a legacy not only of technologies that can be used by the organisation but also systems within the organisation and skills and experience within the workforce which allow this work to prosper and flourish. Everyone involved in the data side of the project has put in place new systems to better capture and share data and learned valuable skills which will live within the organisation allowing further development work as well as new projects to be undertaken. A key feature of the whole Patient Feedback Challenge has been the development of staff and volunteers and the work with data and reporting was no different in this respect.

References

Beck et al. (2001). Manifesto for Agile Software Development. Agile Alliance. Retrieved 20-8-2012

Burns, 2004. Spreadsheet addiction. http://www.burns-stat.com/pages/Tutor/spreadsheet_addiction.html Retrieved 16-5-2012

Unstable net promoter scores

If you work in the NHS you’ll know that the “net promoter score” (aka the friends and family test- “would you recommend this service to family and friends?”) is coming to NHS services, indeed it’s hit the headlines with the coalition government promoting its use in the face of some pretty stiff opposition (e.g. here).

I have never been all that convinced by it, particularly the way the score is calculated by subtracting the percentage of detractors (those who would not recommend or unlikely to) from the percentage of promoters (those who would recommend). We’ve started using it in our survey and I have had a query because one area’s scores are jumping around a lot. I thought it was high time I looked at the actual scores. We don’t use a 10 point scale as many do, we use a 5 point scale, with the points being equal to “Extremely unlikely”, “Unlikely”, “Neither”, “Likely”, and “Very likely”. Net promoter is calculated by taking the proportion of “Very likely”s and subtracting the proportion of the bottom 3 categories summed.

I’ve looked at the methodology in two ways. Firstly, by comparing the net promoter method with just finding the average (allocating 1 to 5 points from “Extremely unlikely” to “Extremely likely”). It’s arguable how appropriate the average is given the negative skew to the responses, but it will have to do as a comparison. It should be noted that I’ve used two y-axes, one blue on the left for the net promoter methodology and pink on the right for the average methodology. One has to be very careful when using two y-axes, it’s easy to mislead the reader by moving the scales up and down, but really we’re just watching the lines go up and down together over time, so we should be safe here.

Here is the results, with each panel representing a different service (anonymised). Image

Actually, it looks pretty okay to me. You can see the service this query originated with, right at the top left, labelled “V”. The promoter scores jump about quite a bit more, but the inverse “V” shape is evident also in the average

Secondly, I’ve compared the question itself by comparing with our overall “How do you rate the service quality” question using the net promoter methodology. Here are the results of that. 

Image

Again, the results look okay, but you can see again poor service “V” whose family and friends test promoter scores (blue) are jumping around but, interestingly, even with the promoter methodology their Service quality scores are pretty static.

A lot of the objections to the friends and family test (e.g. here) are based on preliminary psychometric investigations rather than “live” data, so this is an interesting addition to the overall debate.

So, if anything, it’s the question that appears to be problematic, I must admit I did think it would be the other way around. I’ll follow up this analysis when we have more data.

Bottom-up creation of data-driven capabilities: show don’t tell

Reblogged from House of Stones:

Click to visit the original post

I’ve been writing lately on what to do when people who make decisions in an organization say they want data-driven capabilities but then ignore or attack the results of data-driven analysis for not saying what they think the data ought to say. Some of the most productive things you can do in that situation include automating your work so you can devote more time and attention to more important (and labor-intensive) projects, as well as…

Read more… 1,495 more words

Fantastic post on how interactive analysis can help sell data-driven answers within an organisation. Time will tell if I can achieve similar things.

Shiny app running at last

I’m currently taking part in the NHS Institute for Innovation supported Patient Feedback challenge with colleagues at Nottinghamshire Healthcare NHS Trust. We’re delivering our patient feedback in a web system and I have been working with the web developers, the lovely people over at Numiko, they’re doing all the web type stuff and I’m writing R code, running over FastRWeb, which is fantastic and will get a well deserved blog post some time soon.
In order to make sure that my code keeps pace with the interface they design and there’s no nasty surprises when the launch of the website gets near I’ve been doing some prototyping with the wonderful Shiny package from the amazing people over at RStudio, whose IDE I sincerely love.

I’ve done a little video about it for the challenge which I thought I might as well share on here. I’m blown away by how easy it is to use and although I’m using it for prototyping in this case I can imagine building a whole tool with it and getting something really useful and attractive, maybe with a little bit of JavaScript jiggery-pokery.

Video

I won’t share all the server side code because it’s a bit of a mess at the moment, I’ll put a cleaner version up on GitHub once it’s a bit further along, but here’s the UI code just to demonstrate how simple it is (really the server side just picks up one or two of the variables and then does various graphs etc. based on code I already had).

library(shiny)
# Define UI
shinyUI(pageWithSidebar(

# Application title
headerPanel("SUCE results"),

# first set up All/ Division results

sidebarPanel(
selectInput("Division", "Select division", list("Trust" = 9, "Local"= 0, "Forensic"=1, "HP" = 2)),
conditionalPanel(
condition = "input.Division != 9",
uiOutput("divControls")
),
textInput("keyword", "Keyword search: (e.g. food, staff)"),
selectInput("start", "From: ", list("Apr - Jun 11" = 9, "Jul - Sep 11" = 10, "Oct - Dec 11" = 11,
"Jan - Mar 12"= 12, "Apr - Jun 12" = 13, "Jul - Sep 12" = 14)),
selectInput("end", "To: ", list("Apr - Jun 11" = 9, "Jul - Sep 11" = 10, "Oct - Dec 11" = 11,
"Jan - Mar 12"= 12, "Apr - Jun 12" = 13, "Jul - Sep 12" = 14),
selected = "Jul - Sep 12"),
checkboxInput("custom", "Advanced controls", value=FALSE),
conditionalPanel(
condition = "input.custom == true",
selectInput("responder", "Responder type", list("All" = 9, "Service user"= 0, "Carer"=1)),
selectInput("sex", "Gender", list("All" = "All", "Men"= "M", "Women"= "F"))
)

),

# Show the caption and plot of the requested variable
mainPanel(
h3(textOutput("Title")),
tabsetPanel(
tabPanel("Stacked plot", plotOutput("StackPlot")), 
tabPanel("Trend", plotOutput("TrendPlot")), 
tabPanel("Responses", tableOutput("TableResponses"))
))
))
 

But I do want to be data (cross posted)

I’ve got a guest piece on the Patient Opinion Blog which is cross-posted below.

Healthcare scientists will universally recognise the dominance of quantitative methodology over qualitative methodology, with the oft-qutoed hierarchy of evidence featuring such quantitative behemoths as meta-analysis and RCT at the top and qualitative methods further down, rather sniffily described as “Case reports” or “Case series”.

As a rather hardcore quantitative scientist myself with a great deal of good feeling towards my qualitative brethren, it was with joy and horror that I read Paul’s recent piece on the Patient Opinion blog “I don’t want to be data, I want a conversation”, which flips the usual dominance relationship on its head and champions qualitative over quantitative methods. Paul argues that the NHS needs to “step outside its mindset” and stop using words like “captured” and “mined” in regard to patient feedback which should be treated, first and foremost, as a conversation.

One of the funny things about being a data monkey in the NHS is that people think of you as rather like a machine, spewing out graphs and computer code and don’t think about you as getting sick, or tired, or seeing a doctor. But in fact I have rather a lot of chronic illnesses and it seems I’m forever waiting in the GP’s reception or waiting for my consultant to call me. I’ve seen many doctors over the years, some brilliant, some okay, and some really dreadful ones, and I totally identify with the idea that when you feedback to the NHS you want it to listen to you and respond properly rather than giving you a “corporate” response which really just protects them legally and doesn’t commit to any change.

I do think it’s dangerous, though, to conflate the use of surveys, statistics, and data-focused methodologies with poor quality responses from health providers.

Because actually I do want to be data.

I recognise the value of response rate, sample size, reliability, and validity and although I am a big fan of the story-driven approach adopted by Patient Opinion I worry that if we only had two stories from each service user area we wouldn’t know enough about the silent majority. A good example of this would be where specific service user groups had poor engagement with feedback mechanisms. Their voice would never be heard. Using a data-driven approach we can compare the demographic composition of survey respondents with the known demographic composition of service users and ensure we are hearing everybody’s voice. Where we are not hearing everyone’s voice, we can even use complex survey techniques to adjust summary statistics to better reflect the “true” value of the statistic in the population.

I’m an atypical example, of course. Most people don’t want to be data, and I frustrate people in my work and home life by continuously harping on about probability, the ecological fallacy, correlation (it’s NOT CAUSATION!), the presence or absence of control groups, covariates, and so on ad infinitum. But I’m proud to be a data monkey and it’s my heartfelt wish to make sure that we can hear all the voices across all of our health services, analyse, mine, factor, and standardise them, and only then will we be ready to have a truly informed conversation over a nice cup of tea (this stage I’ll leave to the experts, I’ll stay here with my spreadsheets, but it’s white with none, thanks).

Statistics

Reblogged from Stats in the Wild:

To most people, statistics means plugging numbers into an advanced calculator that spits out values, without much thought involved. Those people don't work with data.

-Nathan Yau

Cheres.

Read more… 2 more words

Hear hear
Follow

Get every new post delivered to your Inbox.

Join 53 other followers