This week, while data research expert Troy Hunt was cleaning out his drive, he came across a document that shows over 30 million South Africans have had their personal data exposed.
It’s not every day that an Australian finds themselves at the centre of South Africa’s biggest story, but Hunt has managed exactly that.
Hunt runs a site called haveibeenpwned.com, which allows visitors to find out whether their digital profile has been compromised. Helpful souls send him data – lots of data – to aid him and his site in their endeavours.
In the hours following the breaking of this story, panic levels have risen. We reached out to Troy and he was kind enough to grant us an interview.
htxt: How the hell did this [the data leak] happen?
Troy Hunt: (Sighs) Where do we start right? Well, chronologically here’s how it breaks down: in April of 2015 a file was created and sometime after that it was placed on a web server and I guess the most charitable way to say this is that it was ‘published’ to the server in question.
Now the date of April 2015 is known because the locations of that web server was disclosed today and the date on the file was April 2015. That’s about as far back as we can go.
In March of this year someone sent that data to me. Now a lot of people send me a lot of data and this was one other thing that went into the folder of ‘things I should check out later on’. When I took an initial glance at it, nothing really stood out to me at the time. It had a file name of ‘masterdeeds.sql’ and after quick look at it I saw some South African references, but nothing that clearly indicated what it was.
It was only this week, when I was going through the data I’d been sent and needing to clear some of it out that I came back to this file – it’s a big 27GB file. When I started probing into it I saw a lot of South African references again and just over 24 hours ago I put out a few tweets and asked if anyone could help. This was really curious and there was no obvious source for the data.
South African followers: I have a very large breach titled "masterdeeds". Names, genders, ethnicities, home ownership; looks gov, ideas?
— Troy Hunt (@troyhunt) October 17, 2017
From there, things moved pretty quickly. We started working out this data contained national IDs, ethnicities and other sensitive information that possibly has been sourced from a local company. So far though, no one’s claimed responsibility for it.
htxt: That really shouldn’t be that surprising should it?
TH: You say that but I’ve gone through this process a lot of times and no matter how many times I go through it, the same thing always happens; the organisation responsible denies any wrongdoing right up until the point when they can’t deny it anymore.
To be honest, until the organisation in question acknowledges it’s its data, I’m reticent to throw anyone under a bus. But until that happens all they’ll do is dig themselves in deeper. These situations have one inevitable conclusion: eventually you have to come clean because there’s insurmountable evidence that points to a source and the longer you leave it, the worse things become.
htxt: This has been described as a data breach, but it’s not really a breach, is it?
TH: Let’s be clear about this: there was no breach of security here. Whomever owns this data published this data to a publicly-facing web server. It’s worse than people think.
First off, someone had this data – and I think there are some pretty serious questions to be asked about whether anyone should be collating this sort of data without anyone’s consent – and second then they published just about the entire population of South Africa’s data to a public-facing server that has directory listing as well. This means that when you go to this server you can just click on any file and download it. There’s no obfuscation here to try and hide it.
Now this is the sort of thing that happens by mistake. I believe that no one decided ‘hey, let’s publish this information to world’. But the fact remains that this data was published on a public-facing web server. There’s no darknet machinations involved, no hackers involved; this data was made free and available to the entire world.
htxt: So it’s essentially like a garage sale – you go in and help yourself?
TH: Well, yeah, except at a garage sale you normally pay some money. And of course the intent at a garage sale is you want to make the goods available. Now in this case, there’s no way someone would have done this intentionally. It was a mistake – but it was an egregiously bad mistake.
htxt: In the piece published on Business Day, it was suggested the leaked data was linked to home ownership.
TH: Yeah, well that’s part of the data but it seems to be far more extensive than just what you’d have for home owner data. There’s data in there about racial mixes – just how white you are? Now, I’m not trying to make assumptions about South Africa but I don’t imagine some of this data is necessary information for buying a home.
The best case scenario is that it’s been there for seven months because that’s when someone found it and then it was sent to me – Troy Hunt
htxt: Is there any recourse anyone whose data has been leaked has? Or is the data now out in the wild and there’s nothing they can do?
TH: The reality is this: once your data is out there, it’s out there forever. That’s it. It’s kind of like asking ‘if someone pisses in my pool, what can I do?’. Well, you can drain the pool, I guess, but you can’t drain the internet. So this is a one-way street.
We know the data has been there since March of this year. The best case scenario is that it’s been there for seven months because that’s when someone found it and then it was sent to me. But according to the file date it’s been there for two years.
The problem is, we can surmise when it was posted, but we don’t know where it’s been distributed since. The only safe assumption we can make is that the majority of South Africa’s population who are alive – not to mention some who are deceased – have had their government-issued personal ID numbers and everything else exposed to anyone who wants to pick them up.
htxt: You say that the best case scenario is that the data dump has only been around for seven months. You say that the original date on this dump is 2015. Has this file been updated at all? Has new data been added since it was first uploaded?
TH: That’s not clear. The original date of upload was April 2015. But I don’t know if this was a file being constantly backed up from the location of its source. That’s an entirely feasible proposition; someone could have been cycling through weekly back-ups and dumping the data into the same location.
This isn’t unprecedented; we’ve seen this sort of thing before over here in Australia last year. There was a situation involving The Red Cross Blood Service in which someone had backed up information in a publicly facing database.
htxt: What would be your advice be to someone whose data has been compromised?
TH: Well, look, there are a couple of things here. Out of the tens of millions of records that have been compromised – and I don’t know how many there are because I had to shut down the process before I got on a plane – it was just over 30 million…
htxt: Wait, there are more?
TH: Oh yes. It’s way more than 30 million. Definitely more than 30 million. We’re running it now. I had to close it down before I jumped on a plane. My import process stopped at around 31 million. When this process runs again it’ll be a larger number. I don’t know whether it’ll be 32 million or 42 million, but I’m betting it will be a larger number.
Right now we’re sitting at 2.2 million email addresses that users can access through the haveibeenpwned.com service, but that is going to be a single-digit percentage of people.
The next thing that needs to happen is that the source of this leak needs to come forward.
- This interview was conducted yesterday. htxt.africa has spoken to Hunt since and he’s revealed that over 60 million South Africans have been affected in this leak. This includes South Africans who are deceased and living overseas.