Skip to content

Data governance and why it is important – Kate Carruthers – S1.2

In this episode Kate talks about why data governance is important and the features of a good data governance program. With a slight digression on why data is not the new oil and the problem with data.

Episode link
RSS Feed


Hi, and welcome to episode two of the Data Revolution podcast. I’m Kate Carruthers, and this time I’ll be talking about data governance and why it’s important. I will also make a slight progression to share my thoughts on data as a new oil and some of the problems with data. First, a bit of an explanation about what I’m going to be doing with the podcast. I’m planning on a mix of episodes in the near future. One kind will be like this one, where I’m sharing my thoughts on a particular aspect of data practice, and the other will be a series of interesting guests. I’m pretty excited about the lineup of the guests, and we’ll share information about them on my website. That’s data revolution one word dot tech.

Today I want to talk about data governance. This is because it’s not well understood, and most folks don’t understand why it’s important. Also, as anyone who’s known me for the last decade or so, you’ve heard me bang on about data governance for years. I’m often at conferences saying that data governance is the foundation for information and security, and I want to explain what I mean by this. I tend to think that data governance has been badly named. It’s a name that does not immediately convey the utility of the concept to ordinary folks. It’s not catchy, and it doesn’t roll off the tongue easily.

You can tell that the people who came up with this were not marketers. There are many definitions of data governance, but my favourite one is from a colleague in the US, john Ladley, and it’s from his 2012 book Data Governance how to Design, Deploy, and Sustain an Effective Data Governance Program. It’s still one of the definitive works in the area, and his stuff is worth reading. He defines data governance as “the organization and implementation of policies, procedures, structure, roles and responsibilities which outline and enforce rules of engagement, decision rights, and accountabilities for the effective management of information assets.”

You can see this definition on my website at I like this definition because it covers all the notes of what goes into making a good data governance program. This definition covers all the major features of a data governance program, and at work, my data governance manager has distilled it even further: “Put simply, data governance is the creation and implementation of rules to protect data and to get the most benefit from data.”

And this takes us at starting principle the fact that we treat data as an asset that needs to be managed like an asset. But one thing this definition does not provide is a reason why we need to do data governance at all. Data governance is an essential risk management function, and it provides key information for decision making around information security and cybersecurity spending. If we don’t understand where our data is, how valuable our data is, or how it is protected, then we’re probably making ill informed judgments about where to spend our scarce information security and cybersecurity dollars. To protect our data effectively, we need to understand it. Increasingly, our data landscape is moving from a simple, internally hosted one to a complex, multi-hosted landscape across which we disclose, manipulate, and consume data often. Now, our data is hosted across multiple cloud environments, as well as on premise, as well as software as a service, and this adds to the complexity.

There is also an increasingly complex landscape of privacy and compliance that we need to navigate, and data governance provides the foundations for this as well. A good data governance program will ensure that data is secured, trustworthy, documented, managed, and audited. In my day job, data and information governance is framed around the five no’s, which I got from Mike Burgess when he was at Telstra many years ago, and he’s now at ASIO. We work on ensuring that we know the answers to each of these questions for our data.

  • Question one: do you know the value of your data?
  • Question two: do you know who has access to your data?
  • Question three: do you know where your data is?
  • Question four: do you know who is protecting your data?
  • Question five: do you know how well your data is protected?

I literally used to walk around the university in the early days of establishing our data governance program with all of these questions on a laminated sheet, and the conversation would go something like this: “I don’t need no stinking data governance” oh, just answer the questions on this sheet, and if you can answer them, you’re fine. It would typically result in the person sobbing on my shoulder saying, please help me, Kate. So that was a good way to frame a data governance conversation. Now, if we want to treat data as an asset, then we need to make sure that data is used properly.

And we also need to make sure that we can prevent data errors, especially now with the growth in AI. And we also need to make sure that misuse of personal or sensitive data doesn’t take place. The best starting point to achieve this is through clear policies on data use and effective procedures to monitor and enforce these policies. Another important benefit of a data governance program is improving data security. One of the key objectives of data governance is ensuring that all data is secure if it needs to be, and that there is no unauthorized data access. This means that the Data Governance Office will need to work with colleagues across the organization, and in particular, across information technology, cybersecurity, and information security teams. A good data governance framework must also include specifics of how data can be distributed and shared, both entire and externally to the organization, because inappropriate data sharing is often a vector for cyberattacks nowadays. One thing I’ve discovered about use of data internally is that folks are just trying to get their jobs done.

And they often need data, but don’t understand the risks inherent in the way that they are storing and using that data. Now, I’m going to take a small digression here. I have ADHD and you’ll just have to put up with this kind of thing. There’s a well known article in the Economist that was titled the world’s most valuable resource is no longer oil. No longer oil, but data. And it said, quote, a new commodity spawns a lucrative fast growing industry, prompting antitrust regulators to step in to restrain those who control its flow. A century ago, the resourcing question was oil. Now, similar concerns being raised by the giants at dealing data the oil of the digital era.

But if you think about it, this analogy just doesn’t make sense. This thing about data being the new oil was in the context of the need to regulate the data economy. And I believe that regulation of data does remain an extremely valid point, and some regulation would be a bloody good idea. Several years ago, in the past, I used this analogy of data being the new oil in several presentations, but in the context of showing people that there was an awful lot of it to manage. But that was before I really thought about it. Then I realized that data is very unlike oil in important ways. Data is the ultimate resource. It keeps growing and there seems to be very little that we can do to stop its proliferation.

Now, I think that data is the endless resource that is only limited by our storage and analytics capabilities and by our capacity for regulation. So now I want to talk about the problem with data. The problem with data is it’s really easy to make, it’s really easy to create and store it. And so this is often done without any thought as to whether it is the appropriate thing to do. Anyone with an internet connection and some basic skills can start to collect and store data online and there are no rules for how anyone ought to store data, which explains why so many data breaches are just really some random person who stored personal data in an unsecured s three bucket. Further, often the safety of stored data does not seem to be top of mind. Remember that enormous Equifax data breach back in 2017? That organization had tremendous amounts of personal information on a global basis and they could not even be bothered to patch against known vulnerabilities. And this experience does not seem to have given rise to any learnings whatsoever on the part of local organizations, many of whom have had major data breaches in recent memory that disclose the personal data of millions of Australians.

Apart from the obvious implications for data security practices, it seems as if we have not established a velocity of data and that our model for understanding data and rights in respect of data is that of property. And if we accept this, then we almost also accept that our model for privacy, which is based on informed and explicit consent, is also not quite ready for the world where data about us is so readily captured, stored and shared, often without our explicit consent. And then we’ve also got to remember the cookies fiasco where everybody just clicks yes to get to what they need. So utility trumps privacy every time. So it seems to me that we have a lot more thinking to do about data privacy and security. But one thing is clear is that data will continue to proliferate. And until we sort these things out, its proliferation and its safe storage and usage is going to remain problematic. This will have an impact on our data governance programs in the future.

Now, back to data governance. There is a quote from Kent Aiken, who was a Prime Minister’s Fellow in Canada that I use all the time in presentations. He said, “Complexity is the defining feature of the digital era, and we are not adjusting our governance structures to manage it.” I truly feel that this sums up the challenge we have with data governance. Increasingly, we need data governance that can operate programmatically and autonomously at the edge of our networks. But the tool sets that are available to us are only slowly creeping towards that kind of functionality. They’re kind of primitive, to be sure. But coming back to what data governance programs can help with how data governance can help with identifying data at risk is an important consideration.

If you want to identify data at risk, you need to have a classification framework for data. And once you’ve located sensitive data, you need to ensure that that sensitive data is stored and managed properly. That means that you need to have a set of data handling guidelines that specify how to manage data across its entire lifecycle, including data creation, data access, data storage, data transmission, data processing, data integration and flow, data disposal, and data retention. Now, data disposal is the one thing we all need to get better at doing. I keep saying that we are all such terrible hoarders of data. We hoard it like dwarves hoarding gold. And we need to get better at getting rid of the data that we don’t need, because all it is is a risk, not an asset. And you also need to think about things like data sovereignty and data management.

And data management practice is something we need to talk about in a future episode. Some other things that a data governance program can assist with, including complying with increasing regulatory requirements, improving data security via collaboration with information and cybersecurity professionals, creating and enforcing data distribution and data sharing policies, creating the basis for effective data and analytics operations. And we’re going to talk a lot about this over the course of the podcast and also identifying the crown jewels. So your precious things that really need to be protected by your cyber and infosec teams. So there’s a number of key factors about running a successful data and information governance program. First of all, data governance needs to be a good fit for each specific data domain and for the business operations it supports. It needs to be developed collaboratively with the stakeholders because there’s no single one right answer for every part of the organization. And a data governance program that doesn’t take account of the differences across the organization will ultimately be unsuccessful because it won’t meet stakeholder needs.

So data governance needs to be a stakeholder driven activity and you shouldn’t engage in it if they’re not coming along on the journey with you. And the data and information governance framework needs to be able to help the business to better manage information and data quality. If it’s not doing that, then there’s no point doing it. So no data and information governance activities ought to be undertaken without stakeholder buy in and leadership. I always frame our work in the data and information Governance office as facilitation rather than leadership. Now, there’s some other things that need to be lined up and these are in no particular order is you need to assess and define your risk and controls. You need consistent data definitions across the entire organization. And this has been a perennial problem.

I worked on a data warehouse project at GIO back in the day. That was one of our big problems with inconsistent data definitions, and I’m pretty sure it’s just the same everywhere. Now we need to have data driven improvements, so using data to drive the improvements also makes sense. One of the big things we also need to build is data literacy, and that’s a real challenge to develop that across the entire organization. And the other thing that we really need to focus on is data quality because all of our AI efforts will be for naught if we have really bad quality data. And then the two practices that I really think every organization needs is master data management and metadata management. Tracking provenance of data as it moves around now becomes even more important. But the most important thing of all that needs to be clarified and agrees, the roles and responsibilities and in particular establishment of decision making rights and input rights in respect of data.

Getting to who is the decider is the most single important thing to do, and then the next thing is to empower those people to start making decisions about the data. But above all, managing data risk is a team sport. There’s no single part of the organization that has all the answers. Reducing risk needs collaboration and it needs broad collaboration across the entire organization.

Based on my experience, it really does take a village to establish an effective data governance program. It reminds me of that saying, if you want to go fast, go alone. If you want to go far, go together. So I recommend that you find allies within your organization who can collaborate with you and also find allies outside your organization for knowledge sharing and commiseration and possibly drinks.

That’s all for now. Hopefully you’ll all join me again next time, where we’ll be joined by a special guest. Thank you very much for listening.