As the time to submit something for the scientiae carnival draws nearer, it was very difficult to decide on the challenge I want to share. Partially, this is because too many details and anonymity is out the door. Partially because it looks like my biggest challenge is now.
There’s only a couple of months left for me to finish my PhD, and a part of that is resolving around 4 datasets that should give the same results but – very frustrastingly – aren’t. Let me explain.
Dataset 1A was gathered in 2006 and gave very interesting results. Another student was engaged to follow up and this and ran a partial replication, giving us Dataset 2A (2007). However, results from 1A and 2A turned out to be opposite. This was rather distressing, since I’d published about dataset 1A by then!
After much hemming and hawing, I decided to do a full replication of both 1A and 2A. The data were gathered last year, so now I have also datasets 1B and 2B.
And while 1A and 1B are reassuringly similar (down to having the same fit problems due to a one-dimensionality assumption), 2A and 2B are not similar. Considering that 2 is already a partial replication of 1, it is rather difficult to explain WHY neither 2A nor 2B give the same results as 1A or 1B. But wait… it gets better! 2B also doesn’t replicate 2A! So the past couple of months have been soent with looking at other ways to analyse these data: some very in-depth explorative data analyses and some new ways to try and find out about the possible multi-dimensionality of these data.
The data have my tearing out my hair (almost literally), because in my planning I’d already finished this off by now and moved on to another dataset. And while all this hassle may prove to be the connection that I’m looking for, and while it is very interesting, I’m also on a deadline.
There’s another meeting with my supervisor in two weeks, let’s work so that there’s progress to report by then.
Also, before I added this sentence, word count was 333.