The challenge of big data

As titles go for an e-book, the Information Commissioner’s Office has not gone for one that grabs the attention and instead just stuck with descriptive. Let’s face it, if you saw a book in Waterstones entitled; Big data, artificial intelligence, machine learning and data protection you probably wouldn’t pick it for holiday reading. At 114 pages it is also long. If you process ‘big data’ then it is aimed at you.

As you have already worked out no doubt, there’s too much in the book to cover in a short article so this is little more than an overview. The book repays reading. Click here for a copy:

The pdf starts by defining big data as:

“…high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making.”

In brief it means massive datasets, real time data from different sources.

It then goes on to define artificial intelligence (AI) and machine learning (ML), which are often confused and might well continue to be so even with the listed differences.

The next two chapters cover data protection implications and compliance tools, which are followed by one headed discussion but maybe explanation would be more accurate.

Perhaps the most relevant chapter is the one which lists six ‘Key Recommendations’. These require organisations to:

1/ consider whether there is a requirement for personal data and whether annonymisation before analysis would be the better option.

2/ ensure transparency in dealing with personal data. There is a section on privacy notices to be used throughout and at appropriate stages.

3/ use privacy impact assessments in processes to identify risks to personal data. PIA have been promoted by the ICO.

4/ adopt a privacy by design approach. This too has been mentioned in other contexts.

5/ develop ethical principles to reinforce key data protection principles.

6/ implement innovative techniques to develop auditable machine learning algorithms.

The ICO also stated the obvious, that big data is personal data and all the laws and regulations apply. However, some of the more strict rules under the GDPR will have an impact on big data. They give the instance of automated decision making. Individuals enjoy a qualified right not to be subject of it so consent will require information on processes to be clear and unambiguous. 

You may have noticed that there are requirements for companies to ‘develop ethical principles’, to ‘implement innovatve techniques’ and to ‘develop auditable machine learning algorithms’. See 5/ and 6/ above. Or, to put it another way, much of the development work for systems is left to us.

This might be seen by some as the ICO opting out of its responsibilities and dumping their regulatory function on us. Well, maybe there’s some truth in that. On the other hand, it gives us flexibility to design bespoke systems to suit specific needs.

The ICO is obviously concerned with the challenges of big data complying with the regulations. The book is extremely helpful, and not being pedantic is one of it best features. I assume the ICO will, as ever, respond promptly and clearly to requests for clarification of any point.





