Skip to content

Introducing the Data Revolution podcast – Kate Carruthers – S1.1

Welcome to Episode 1 of the Data Revolution podcast.

Episode link
RSS Feed

In this episode I cover my personal  journey into the world of data and give an overview of some key concepts for future episodes.

Transcript

Welcome to my podcast. It’s called Data Revolution. The Data Revolution podcast is about exploring the intersections between business intelligence, data analytics, artificial intelligence, privacy, data protection, cyber and information security. And as I keep saying, data and cyber are the 2 biggest growth industries left to us now.

My name is Kate Carruthers, and for this first episode, I thought I’d share my personal journey and how I came to be fascinated by data and the amazing and terrifying things that it can do for us. I work at the University of New South Wales as Chief Data and Insights Officer, starting in that role way back in 2014. It was the first appointment of a Chief Data Officer or CDO in Australian higher education. Before that, I’d been working in the engineering faculty as the IT manager for a couple of years.

I was managing a plethora of technology that makes engineering teaching and research work, including the many high-performance computing clusters. Back in the day, we had more than 20 HPC clusters in the faculty, as well as access to various external HPC facilities, such as the National Computing Infrastructure at the Australian National University, Pawsey Centre in Perth, and multiple international facilities. Because one thing I learned back in those days was that engineering has an infinite capacity for compute and storage. But one day I made the fatal error of asking some questions, questions about data. Like where were we allowed to store sensitive research data? What were the rules for how we had to handle such data? And there were no good answers. So I ended up applying for the role and got it. The interview process was fascinating to me because I’d never really thought of myself as a data person. I’d also always been an IT person.

I’d worked in all sorts of roles across the information and communications technology landscape before landing in the world of projects, and large projects because small projects bored me. But in the olden days last century, I’d worked as a database administrator, as a data modeler, and I swear I am the world’s worst data modeler. I am slow and ineffectual. I had also worked as a consultant on the implementation of several enterprise data warehouses, or EDWs, using tools like Cognos, Oracle, Hyperion, Business Objects, and others that are all long gone now. I even lived through the Kimball vs. Inmon data warehousing wars of the 1990s. This was where folks took opposing sides in a data warehousing structure debate, coming down either on the side of Ralph Kimball and his conformed dimensions in star schemas, or on the side of Bill Inman and his normalized forms for entity structures and data marts for querying. Full disclosure, I was a Kimball fan back in the day and every data warehouse that I’ve worked on has been a Kimball style 1.

But Inmon has now finally won me over with his idea of the data lake house, which we’ll cover in future episodes. But in addition, I had also worked extensively in web and e-commerce development at scale and also in digital strategy and digital marketing. I was cognizant about the world was undergoing technological changes that were akin to the Industrial Revolution. And I realised that we needed to get ready for these changes. So in reality, I was kind of a data person. But as I started to think about the future, and remember this was back in 2014, I started to realise that all our digital transformation would be driven by data. Digital is actually impossible without data, and if the university wanted to transform, as many businesses did back in the day, and bear in mind, they hadn’t even thought about digital transformation back then, they would need to sort out their data. And that is what I’ve been working on ever since.

Now, I just want to touch on some themes that I’ll be coming back to in this podcast. First of all is data. Back in the day, we used to hoard data, storing it in databases like dwarves hoarding gold. But we never really did much of anything with it. We ran some reports, but this data was static. It sat there, it was difficult to join up even for reporting. And much of the reason for the rise of the enterprise data warehouse was to deal with the issues relating to the constraints of physical servers, of physical disk and memory. And this was part of what made even enterprise data warehousing really tricky was because you had finite scale servers with finite scale disk and memory.

And a lot of the things that both Ralph and Bill talked about were due to the constraints of physicality. But now with the cloud, Data can flow, it can scale almost infinitely, and with all those memory and disk constraints gone, we can now do almost anything we want with our data. Now data flows and it drives business processes. It drives autonomous actions at the edge of our networks. And it’s the key to so many things in our world. Secondly, there is the rise of Artificial Intelligence or AI. And this is about to change our world in ways that we can only dimly discern at the moment. I want to briefly explain how I conceptualize the AI world and how I explain it to normal human beings.

So let’s start with artificial intelligence or AI. This is the field of computer science that wants to create intelligent machines that can replicate or even exceed human intelligence. It’s the idea of machines thinking like people. It’s kind of the umbrella term for anything in this space, but typically once it’s in production, it’s got a different name. So AI is the general thing, but the specific thing has a specific name once it turns into a production application. And a subset of AI is machine learning, or ML. This is the subset of AI that enables machines to learn from existing data, and also to improve upon that data to make decisions or predictions. Tom Mitchell wrote the book on ML, literally the book, as recently as 1997, so it’s still fairly new.

But that’s been powering our predictive analytics for a long time now. The next subset of it is deep learning, which is a machine learning technique in which layers of neural networks are used to process data and make decisions. And These are working typically able to make supervised and unsupervised decisions. And then there’s generative AI, which enables AI to create new, and new is kind of debatable, visual auditory content given prompts. So we talk about prompt engineers now or existing datasets. So this is sort of tools like chat GPT or auto GPT that are rewriting what is possible now. And I was at a conference this week where 1 of the professors at Nanyang Technological University in Singapore was talking about some of the legal issues that are arising in respect of things like CHAT-GPT. And she was saying that 1 should not rely on its ability to discern if something is genuinely new.

So she did recommend not using poems that it writes as your own because you might be up for a copyright violation. But this all gives rise to the next thing. There is now a huge need to protect all of this data. And this is where cybersecurity and information security come into play. And now I want to take a moment to define both of these things, just so we can be clear what I’m talking about. So cybersecurity, or cybersec, is all about protecting our assets from external threats. Or as NIST, also known as the US National Institute of Standards and Technology likes to say, quote, the ability to protect or defend the use of cyberspace from cyber attacks. So these are people outside your organization trying to attack your cyberspace.

And information security or InfoSec is all about maintaining the CIA, that is the confidentiality, integrity and availability of data. Resnist says, quote, the protection of information and information systems from unauthorized access, use, disclosure, disruption, modification or destruction in order to provide confidentiality, integrity and availability. Now, both of these are exceptionally important now, but the thing that keeps me up at night is data integrity. For example, just imagine if you’re wearing an implanted medical device. You would want to be very certain of the CIA for this device. And it’s changed a lot. Back in the late 20th century when I was managing my first IT system, it was a Unix system. It was running system 5 Unix.

And it had a whole 16 megabytes of RAM. We never really gave any thought to information security or cybersecurity back in those days. There were much more innocent days. In those times hackers were more interested in phreaking, that’s with a PH, or hacking into telecommunications. So they were often hacking telcos so that they could make free phone calls. But those times are gone now. Now it’s easier to run a malware attack than it is to rob a bank. In the late last century, bank robbing was a very, very popular way to make money.

But now you can sit at home in your pajamas and just rob from wherever you are. And this is only gonna get easier due to things like generative AI, because you can ask it to write your malware. Recently, I’ve had Rust learning the Rust language on my to-do list for ages, And I never get around to it because I keep doing other things, but I asked ChatGPT to write some Rust code for me and it wrote it, I took it, it ran, it was fine. So developers, it’ll be interesting to see what happens with your jobs in the future. Other things are changing too, things like the face of war. In the past, wars were declared up front between 2 states, but now Russia’s been at war with the West for the past decade at least, and has been waging an information war against us, using tools of disinformation and misinformation. And with the advent of the Stuxnet attack by the US and Israelis, the notion of states reaching out to interfere on foreign soil without even setting a foot there became a reality. Now, Stuxnet was a malicious computer worm designed by the US and Israeli intelligence services and it was deployed to disable a key part of the Iranian nuclear program.

And it was discovered in about 2010. Might’ve been written 2005, not sure, but it heralded a new world where we can have wars that are not your traditional declaring a war and fighting a war, 2 armies standing face to face fighting. Now there is all of this information warfare that is happening. And now in Ukraine, We’re seeing new ways of waging war with drones and new AVs. And technology and data underpin all of this. So the future of war is data-driven too. So I would argue that the future of everything is data-driven. So some of the topics that I’ll cover in future episodes will include AI and ethics, data governance and why it’s essential.

Also things looking at things like new jobs that are emerging, what practices that data professionals will need to adopt, And how new technology is changing the face of warfare, because it’s a particular interest of mine.

That is all for now. I’m going to try for a fortnightly cadence. For non-Aussies, that means every other week. Hope you’ll join me again next time. Thank you