E62 - Chris McLellan, Director Operations at Data Collaboration Alliance

48:20

SUMMARY KEYWORDS

data, copies, people, collaboration, problem, datasets, node, privacy, create, project, collaborate, called, sustainability, debbie, organizations, duplicates, community, alliance, world, network

SPEAKERS

Debbie Reynolds, Chris McLellan


Debbie Reynolds  00:00

Personal views and opinions expressed by our podcast guests are their own and are not legal advice or official statements by their organizations. Hello, my name is Debbie Reynolds. And this is "The Data Diva Talks" Privacy podcast, where we discuss Data Privacy issues with industry leaders around the world, with information that businesses need to know now. Today I have a special guest on the show. His name is Chris McLellan. He is the director of operations for the Data Collaboration Alliance. He hails from Canada. Hello.


Chris McLellan  00:38

Hey, Debbie. It's a pleasure to be here.


Debbie Reynolds  00:40

Yeah, this is going to be fun, because so you and I met on LinkedIn, you contacted me, and we actually have some friends in common. So I'm gonna throw out Jeff Jokisch, who's a friend of ours. And he actually is the person who introduced me to you introduced me to the Data Collaboration Alliance and the stuff that you guys were doing. And you and I chatted, and we hit it off right away. So we're actually having fun finding ways to collaborate together and just spread the word about data and data folks and how they can get involved. So I would love for you to describe kind of your journey and the things that the Data Collaboration Alliance is doing for anyone who doesn't know what it is.


Chris McLellan  01:28

 Sure thing. Thanks, Debbie. And Jeff, shout out. He's a great person and does a lot for the privacy sector and community. So find him on LinkedIn and be friends with him; you won't regret it. Yeah, so my background is broadly speaking technology and startups, and I've been at this for a good 20 or 25 years. And in a funny way, it's come a bit full circle; I started in networks, like Blackbox, Telecommunications, and Data Networks, back in the sort of mid-90s, for a company called Newbridge Networks, which is now part of Alcatel or Nokia, I believe. And that's how I sort of got started in the technology interest industry. And since then, it's been a real journey of sort of the next big wave of tech. And so I followed that with ventures into the web, and the web followed mobile and mobile followed social and,  then social in a funny way led me back to data. But in a kind of a different way. I was a marketer and a growth expert for startup companies; I was looking at the future, as we often do. And as we all know, marketing and growth are very susceptible to trends and changes and channels and best practices. And I was looking ahead, and I was noticing a lot of things coming down the road in the form of automation in one form or another. And I was like, okay, so a lot of what I do for a living is getting increasingly automated. And maybe I should look further into this and either master these tools or before they master me. And that led me to artificial intelligence and understanding more about machine learning, specifically within artificial intelligence. And I was like, okay, so this is an algorithm, like, I get what an algorithm is, it's like if this then that, but at a large scale, that's, you know, that's not too difficult. And I get it, and I get that how good quality data will make that whole magic happen. But then I was like, okay, so what data is it feeding on? And then I started looking into that. And as it turns out, as we all know, a lot of data is the answer to that question. And, and so then I was thinking, Well, where are they getting all the data from? This was before the notion of synthetic data had come along and things like that. And it's like, well, people, own companies. And I was like, Don't people own that information already? And what are they what are these hungry algorithms getting feasted on? And it's like, well, yeah, internal data. And sometimes they do alliances, and sometimes they form social media companies and take the data there. But I recognized pretty quickly that the real challenge to developing the automation and AI ml sector was sourcing data. And I was like, the problem within that is, is getting access, you know, you can't just take it, and there are regulations and all sorts of things preventing folks from doing that. So it's like,  how could you work on data without giving up control? I was like,  what if an AI or anybody else for that matter could work with somebody else on their datasets? Without giving up either party giving up control? And I'll be like, Wouldn't that be awesome? And then I was thinking I was trying to find the word for that right. And the word for that after some searching was collaboration, collaboration as a form of sharing, where you don't give up agency or ownership of the thing you bring to the thing to the collaboration. And I was like, Okay, so now I've got my word because I never like most people; I never really thought about data collaboration or the meaning of the term collaboration. I just knew it meant cooperating in some way. But it was the ownership thing I didn't really think about. And so I was like, okay, so who's solving that problem. And if they do that, they'll probably solve a lot of other problems, too. And so I did some searching. And I came across a company where I know, work. And they've enabled me to start this nonprofit called the Data Collaboration Alliance. And the Data Collaboration Alliance is all about advancing and accelerating through actual projects, not just white papers and meetings, but through actually doing the hard work of accelerating and this notion that two parties, whether an individual or an organization or two organizations, can work on data collaborate on data together without giving up ownership or exchanging copies of their information. And that's how I ended up where I am today at the data collaboration Alliance, where we're doing that work and making it real.


Debbie Reynolds  06:14

Wow. Now, I didn't even know that all that their background, that's really great. Well, collaboration is important. And one reason that attracted me to what you guys are doing is because I like to do collaborations like this, where you kind of join together and do something together that you probably couldn't do on your own. So we're sort of creating new stuff, as we're collaborating and talking and having people you know to join in in that way. I would love for you to talk a bit about some of the initiatives that you're working on now. Because there are quite a few big ones that we've talked about recently that we'd love for you to be able to explain to the audience.


Chris McLellan  07:03

Sure, be happy to serve the Data Collaboration Alliance; we have three main initiatives. And it's they're almost nested like Russian dolls; they all sort of feed off of each other. Our biggest one and the most global in nature is the Collaborative Intelligence Network or the CIN. And right now, that's a collaboration on a document, a blueprint for putting into place the technologies, the interoperability is the protocols and the standards that will make them a global Internet of interlinked datasets possible. And because a lot of the underlying technology that supports collaboration, instead of sharing, or sorry, copy-based sharing, is related to some technologies that have been around for a while. And so, whereas AI and MO are about really reflecting how the brain learns, think about the brain, the architecture of the animal brain, it's its collection of neurons and axons. It's a network. And, and really, what we're trying to do is create a blueprint for a network of datasets that can be interlinked in what we call nodes. And each of these nodes could be owned or hosted by an individual. So there could be Debbie's node, or it could be an organization or a small business, or even an enterprise or government agency. Nodes are flexible. Nodes come together and interact to make a network. But this is a controlled global network architecture. This is no copies between nodes. This is exchanging access grants between them. And so, it takes a bit deeper dive to get into some of the underlying frameworks. But that's what the blueprint project is all about is bringing more and more wherever we're widening the circle of contributors to this Google Doc. And soon, it will be public for public viewing. To get more feedback, and thoughts, and ideas and how we can bring this all together. There's a lot of initiatives, as you know, in the world about data privacy, data ownership, data protection, but it would be a sad irony if we made a silo of the attempts to desilo data, if you see where there's a lot of initiatives. And I think a big part of this Blueprint Project, which you can see on our website at DataCollaboration.org  is to bring these parties together and so sure their self-sovereign identity in there, there's blockchain. There are linked data, as I mentioned before, and it's really an exciting thing to bring all these voices and ideas and different initiatives together around one table to try to create, like I said, a blueprint for you know, I wouldn't necessarily align it exactly to Web 3.0, but it's certainly what is Web 3 though. It's a concept. And maybe this is one manifest manifestation of that concept. So that's our first big initiative.


Debbie Reynolds  09:58

 So let's dig deeper. Now you and I had a chat about this kind of node and the way the Internet works and data sets and sharing. So I love you to do a little bit deeper and explain to people what you mean by nodes and how we share data today, and what we're thinking about the how we're thinking about sharing it in the future. 


Chris McLellan  10:23

Well, if you think about how data where data lives, today, it lives in spreadsheets, and it lives in application-specific databases, and other types of data stores, you know, data warehouses, data, lakes, you know, data, this data that and some, and so really, what that is, are these are individual silos, often you'll hear the term data silos or data fragmentation. And so what are the, you know, this all started in 1979 or so with the advent of the relational database and Oracle, and then what came quickly after was we got kind of addicted to apps and an app for everything and a database for every app. And fast forward 40 years, and now we've got billions of data silos and But increasingly, we recognize two things about this architecture for data management, one that it makes copies a necessary evil, if you will, of any innovation project. So if you want to build a new app, or stand up a new project for, you know, training an AI ML algorithm, or building a dashboard, or predictive analytics, or building a new real-time system, you're going to create a new database, you're going to have to integrate data from the or some or many of the older databases that were in your rearview mirrors, where, and that's called data integration, and that generates copies. And the problem with copies is they're very time-consuming and expensive to manage.We estimate that for any IT project or digital transformation project, anywhere between 40 to 50% of the time and money is spent on data integration. So that alone is a tax on humanity's ability to innovate, which is, really, unfortunately, a necessary evil today, but it's only getting worse. It's not getting incrementally worse; it’s getting exponentially worse because as the complexity increases, the integration sort of goes up in a hockey stick. And so that's one outcome. And the other issue with data fragmentation and managing data in silos is the fact that you know, for data privacy and data ownership, data protection, just like, you know, we make it difficult to copy money for a good reason. Because when you start to copy something, you erode your ability to protect its value and to control it. And you know, increasingly over the last few years, the world has recognized that data has great value. And when we talk about PII, it kind of is us. You know, if our data is us, then you know, it's as valuable as we value ourselves. And so you know, to copy it is to create the conditions for lack of control. And so, and it's not an ask them thing, companies like your bank don’t make copies of your data because they're trying to erode your control of it. They do it because of what I mentioned in the first part of this, which was they do it because that's how technology gets built and they gun, the underdog property the data integration and generate copies, not only within their four walls but down their digital supply chain. So those copies are ending up eventually in some other jurisdiction outside of their own. And so what do we what are we saying then now you've got it, your data is now being exposed into a third party environment and another jurisdiction under different regulatory regimes under different governance rules, and so on an audit, auditability becomes next to impossible transparency becomes next to impossible, but really control for that original data owner, whoever it ought to be, is severely compromised. And so to dig into that, that's the problem. That's a very long explanation of the problem. But what's the solution? Well, don't manage data in silos. And so you don't have to make copies of it. And so if you think about networks, the human brain, as you know, you've got memory storage of a carrot. But when you want to make a carrot cake, you don't have to make a copy of the idea of a carrot. When you want to make baked carrots, you don't have to read make a copy of the concept of a carrot because the idea of a carrot is in a network that is in your brain. And it can be connected to these different outcomes like baking this recipe or that recipe. And so that's what the internet does. That's what the World Wide Web does. You know, there was a time when, and sadly, I can remember when you used to have to send documents to somebody else as an attachment for them to collaborate on it. And that was a sad reality. But now you have Google Docs, right? We all get together on one version, and we take control, and you can have access controls. And we can all work on the same document at the same time together. And with a robot to there can be a spell checker in there. So we're collaborating between people. We're collaborating with, you know, artificial intelligence all at the same time. And that's all done, like, say over a network called the web, and the internet is a network of computing devices. And what we're proposing in the collaborative intelligence network is a network of datasets. And the same principles apply to that network. As to any network,  they get more efficient, and as they grow, not more complicated, they have the network effect that is very well well researched and well known. And that more becomes more efficient, more becomes more access to information. But it can all happen because it's a zero-copy environment; it can all be controlled by access controls, not copies. And that's a little bit of a deeper dive into what we're trying to put forward into the world.


Debbie Reynolds  16:02

Wow, I love that explanation of everything. Yeah, as you were talking, I'm thinking about the way that people think about the internet right now. So I always tell people, the internet is not a web is actually an ocean. So people make it a web by trying to connect things to it. And so I wonder if we allow people when we're thinking about interoperability, are we even thinking about solving the right problem in a way. So right now, what we have is data and all these silos; they have to communicate with other things, there has to be some level of interoperability. So maybe the way the data is created to make it easier to share or create like these connectors, that's kind of one way to do it. But then, when I think about it, data, in my view, has sort of a life of its own. So it can get kind of float around free form in some way, or, you know, less formed in some way until it can be captured in some other or another way. I don’t know, and they just philosophize for two months? What are your thoughts?


Chris McLellan  17:10

You're absolutely right. I mean, data wants to be connected; what’s a name without a phone number, without an address without a favorite type of ice cream? I mean, our data is, is one leads to another, and really, that's at the heart of what we're trying to make possible is that, like, if somebody has a data set on one side of the world, and somebody has another one on the other side of the world, well, those two could very well complement and extend each other, and they need to be joined up. But there's no infrastructure right now in the world to enable that to happen short of a data-sharing agreement and a bunch of lawyers and a bunch of copies. And so that's a very difficult and challenging and expensive and time-consuming way to join up data that wants to be joined up. And so your point is exactly right. I often think of data what you know, like ingredients in a recipe that can be used over and over and over, you know, you know, a good reference data set, for example, you know, could be used for hundreds of solutions. And, and that's exactly what should be enabled and made possible.


Debbie Reynolds  18:15

Yeah. So the, when we talk about datasets, okay, so there's, you guys host different data sets that people can access publicly Correct. Give me an example of kind of a data set that you guys have right now that people can access and sort of how people, you know, either how collaborators come in, or how people who just want to use and, you know, take a look at those datasets, how they can, you know, do that?


Chris McLellan  18:52

For sure. So, out of the blueprint comes a couple of hands-on projects that the data collaboration Alliance operates. So one is called Node X., And that's really our for good project that we do with some of our technology partners to set up nodes at research universities, research institutions, charitable research, agencies, even potentially corporate for good labs and innovation, pods. So that's, that's one program we run, and that's on our website, and that's called Node X, and that's how we're going to like expand this blueprint into the real world gradually. But the other program you're hinting at is our community in the Data Collaboration Alliance of which you're a member, and we're very happy to have your participation is called Node Zero. So the idea here is just like the internet started with a couple of Xerox computers, and the web started with a couple of web servers at CERN. The Collaborative Intelligence Network or CIN starts with Node Zero. And Node Zero is a login environment where our members at the data collaboration alliance can Log in and collaborate on datasets and even on projects to build solutions for their sector. So one example that came out of our community from Jeff, in fact, that we mentioned at the top of the show, is called the Data Privacy Legislation Grid. And this is a project effort that started as a spreadsheet. But you know, a spreadsheet, as we mentioned, is a silo of its own sort. And so, having met each other, Jeff and I developed this notion of bringing that dataset into a collaborative environment, where it could be linked to other datasets that other people can bring in. And to the point where the community can now log in to Node Zero, they can, and they can view the dataset, they can view it. So what it is, is all the legislation, all the data privacy data protection legislation currently in North America, but we're adding GDPR, we're adding other country datasets, references datasets to this environment. And so you can view it by, you know, type, like for education versus public sector versus criminal and surveillance. But members are able to log in; they can view it, they can request changes, they can query it, they can do all sorts of things with it. But it's very crowdsourced. You know, if you think about it, like, it's a lot of people. You are producing something better than any individual could. And furthermore, it's in a linked environment. So now, the columns and the end of this dataset can be linked to others. And, and that way, it's starting almost a mini-network within our Node Zero environment. So it's really exciting. We're just getting started. And the Data Privacy Legislation Grid, which we have some video and a blog post about, I can share with you for the episode post; people can check it out. But that's just the first. So imagine doing the same sort of approach for sustainability, for open banking, for public health care for public education; the Node Zero community is all about working on some of these datasets that maybe don't exist in the world that need to exist, that we can come together to create. And, as you mentioned, make available to our partners and research organizations or Node X Clients that sort of thing. So it's really exciting. This is where we roll up our sleeves and walk the walk of data collaboration, but also figure out what data governance looks like in a zero-copy environment.


Debbie Reynolds  22:26

I agree I agree with that. I like this project because I think it exemplifies what you're trying to do, and then you're succeeding at it, which is to find ways to help people come in and collaborate together because this research is extraordinarily difficult to do. So being able to have other people who can help, you know, make this dataset even richer, and then also having it as a research resource for people who need this information is very important, because as you know, regulations are only getting more robust, so more regulation will continue to happen, and more complication will continue to happen. So being able to have something that you can share with people who not only need the information but also people who can help keep it updated. So that's another downside of having duplicates. In other places where certain data sets or, you know, certain duplicates may not have those changes, or you know, it may not be a living document, so maybe in a PDF somewhere, it doesn't really get updated. So being able to have something that's kind of a living organism, a living data organism,  that can grow and be a resource to people is really important.


Chris McLellan  23:54

Yeah, you've hit the nail on the head; data is a living thing, it changes, and it needs to adapt, but it needs to do so in an environment that can accommodate that change. And so, it's not just a reference dataset; we’re actually building tools on the back of these datasets that will also be made available. I mean, one that I'm thinking about right now is trying to find some people that are excited about sustainability. And because you've heard, you know, the COP summit in Edinburgh, in Scotland about climate change. Now, a lot of the language coming out was about tree planting, and I tree planted as a university student, and I know the job, it's brutally difficult, but you can't just throw trees out of the back of a truck, and you know, and hope they'll grow. It's a very science-driven difficult undertaking. And I think Canada, for example, wanted to sign up to 4 billion trees. And so that struck me as like, is there a dataset, maybe using geospatial data in this case, which we can accommodate in a node to create a map and allow, let's say data contributors from municipalities anyone was some land where you can plant a tree. And they could put the soil conditions and the weather. And we could then link in the date, the weather conditions, and all sorts of things to create this blended data set that would provide a roadmap of where right down to the local ZIP Code postal code, what have you have where you could plant trees in your neighborhood too, to enable startups that are planting trees, for example, to focus on their operations and a little less on collecting every one of them duplicating the same dataset and trying to figure that out, not just another example, under the sustainability domain of something we can do the Data Privacy Legislation Grid happens to be under the Privacy domain. But you can see where this goes, and it’s really exciting. And I think we've got a great community of problem solvers that love to roll up their sleeves.


Debbie Reynolds  25:44

I agree; I think that's great. So let's talk a little bit about how you and I connected on PrivacyAccess.com. I'm a privacy person before I'm a data person. So that's another reason why I think we get along really well. So for me, it doesn't matter what you use the data for. But this is, you know, I could go any different direction with data. So the other thing that you're doing, and I think is very important, is that you're opening up a way for people who, like us, or kind of data nerds and whatever, you know, the industry that you're in, to be able to join in this collaboration. So it's not just about privacy; it’s not just about sustainability has kind of all these other different areas. So talk to me about kind of the data, scope of people so that they understand. Yeah,


Chris McLellan  26:40

Well, we're growing. And I really tried to approach the Data Collaboration Alliance as a startup of nonprofits. So, you know, we've had a couple of pivots, and we evolve, and we take feedback from our community. And we're getting a little bit more refined all the time. And, and so we started out on the privacy domain. But as you mentioned, we're now expanding our scope a little bit into other. So what we did was just look at the global problems because we want a community of data-centric problem solvers. And so, what are the problems on a very abstract basis? Well, there's sustainability we touched on, there's data, there's privacy and data protection, which is another domain, agriculture, health care, education, social inclusion, smart cities. And we put in the metaverse because it's kind of exciting and maybe attention-grabbing, and we'll see where that goes. But if it's going to be what it might be, then we should enter that whole fray now with the notion of data owners having control of their information and not create a, you know, quote, unquote, second company, again, so those are some of the other domains we're looking at. And so anyone who's data-centric, like you and I are, and maybe you love working on a good spreadsheet and creating a data model there, or maybe you're a database administrator, or maybe you're a data visualization Pro, or, you know, any this whole range of folks, then they're all welcome to join us. And there will be a data set, and they can contribute ideas. Like Jeff nominated the Data Privacy Legislation Grid, they can do so for their own domains. And we can start to create reference datasets to solve some problems provide a resource for their sector and where they have a passion. And that's really exciting. And the other thing I didn't mention before is we also do a lot of work in standards. So we’ve created, we've been a big part of this thing called zero-copy integration. And we're very excited that after a couple of years of supporting it, it's soon to be fingers crossed the national standard in Canada under the Data Governance domain. And we're in very early but significant talks with international standards organizations to help people understand it's really a framework. So we have the other projects that we have at the alliance, the Node Zero blueprint, the Node X Program, and the Node Zero community, but we're also developing the standard. So basically, anyone may be an entrepreneur in Norway or in Nigeria, or, you know, Bolivia can look at this standard and go hmm, this is a blueprint for me to build an app to build a data management environment where there are no copies and data owners have full control of their information. So that's another exciting thing that we do that sort of complements all the other projects as well.


Debbie Reynolds  29:35

Very good, very good. So I want to get down and dirty in the weeds to talk about data duplication. So we talked you, and I know why data duplication is a problem, but I think we should just sort of expand upon that a bit. So I'll jump in okay, so I don't like duplicate kits. Duplicate kits are problematic. They waste time and waste space. So touching on your sustainability problem. You know, it creates more costs for organizations, it creates more risk for organizations. It is, you know, I know a lot of corporate sites have your data-centric or like in like, I'm actually getting involved with people in business intelligence and stuff like that. And duplication can wreak havoc on that. So you may have data that’s stale and old, and not accurate, you know, duplicates brings up all, you know, a multitude of issues that people don't really talk about. And a lot of that, like you said, is based on how applications were made. So every new application was like a new bucket that you put stuff in, and then you have people. So we're all human, where someone says, well, I want my own copy of this, or I want this or that, or I want to do something different. So now you have all these different versions of things, too, that are out of control. And they don't know where they are. And a lot of times, that creates like a huge risk creates a huge cyber risk. Because now that data can be old data, it may or may not have a huge business purpose at the moment, but a hacker will love for a cybercriminal, who loves to get their hands on next, they can do something with that data, or you have a situation, you know, like I said, in business intelligence, where you're making decisions on data, that's not correct. It's not accurate. It's not up to date. So there are a multitude of problems with data and then privacy. So a lot of privacy regulations touch on retention, data retention, so you have to really shore that up. So just give me your thoughts on the kind of duplication in Canada problems there.


Chris McLellan  31:51

Yeah, as you're going through that, Debbie, I, you know, you think was a lot of the adjectives we use around day to day like silos, fragmentation, copies erosion, I heard a new one, I should have known this one before, but data exhaust me these, these are if you add them all up, it doesn't sound like we're treating data with respect it ought to deserve. And so copies, you know, a light bulb goes off and, that has gone off for you and me, which is yes, then if you think about data as a global issue and challenge, it can be a bit overwhelming. But sometimes, you know, you get that old story of the Gordian Knot, sometimes somebody comes up with like,  an analogy, or a phrase sort of, you know, cuts to the heart of the problem, and you're like, and for you and me and others, it's copies, all the millions of copies of our own personal information that are made, and how do you expect anyone to control audit or protect that? It's just, you know, I think even a, a young person, a six-year-old could grasp that that principle. And, you know, it's,  again, when I, when I was mentioning at the start, it's what we do. Now, does anti-counterfeiting measures for money prevent some people from trying to copy it? No, but does it stop most people from copying it? Yes. And so there's no Nirvana here; they’ll always be some copies. And in some scenarios, there need to be some copies. And I'm not suggesting that even a version of data could be considered a copy. But not all copies are created equal, as you and I know, when they shift environments, hosting environments, I'm going to copy shifts jurisdiction, then, you know, there's another level of problem with that copy. But you know, some thoughts about copying our hashtag at the data collaboration, alliances, access, not copies. And the reason I love that is because it really cuts to the heart of the issue of data. Is it really so simple? Is data minimization and eliminating copies? Kind of like? A little bit? Yeah, because what we do right now, as privacy and protection consultants and software vendors and proof tech vendors, is we're chasing copies, we're trying to play Whack a Mole with data that will never be contained. What we need to do is address the root cause of the problem, which is copies and silos. And that's what, and I love that phrase. And that terminology, because it really does sum up the problem in a pretty neat package.


Debbie Reynolds  34:27

Absolutely, I haven't been anti-copy before I was saying that. So yeah, in legacy data, that's kind of the bane of my existence. So it's like, Oh, my God, like, Why do you have this and people can't give you a good explanation. I think I call it like, quarter the corporate edition, you know? So they say, well, let's keep this data because we don't know, you know, people someone told me data is valuable. So let's keep it and not get rid of it. Or let's put it in a backroom, or let's put it on our server. So I think that The shift that's happening now is before, there weren’t any huge risks that people thought about, either financial or business, for keeping that data. And now there is. So now there's cyber risk. Now there's that data risk, and there’s a regulatory risk. I think their recent T-Mobile data breach is an example of this. So the cybercriminals breached data that was many years old for people who had applied for accounts with T-Mobile. So this was this, that this data didn't have a high business value to the organization at the moment, right. So it wasn't like the hype, and, you know, super protected data, this was kind of old stuff. And literally, if they had, like, deleted, it would not have been a problem. But a lot of times, those things get put on all systems, they get put in back rooms, they get thrown in the cloud, you know, over the years, the knowledge within organizations gets lost. So you know, Jasper down the hall, he's the only one who knows what happens is data or something like that. So all those things create risk, and that is, you know, attributed to copying information. And then also not understanding, okay, things that have a lower business value may also have a higher business risk from a cyber and privacy perspective. So I think the change is happening; hopefully, now, that made me happy. If people really think through the purpose of what they're doing with their data, and if the data, no longer serving that purpose, they need to make that needs to be a trigger point for them to decide what they need to do with it. And then on a going-forward basis, making sure that they're there, again, tying their data collection to a purpose, not collecting more than you need, and then figuring out what that trigger will be for when you need to, you know, delete, or move that data for your environment.


Chris McLellan  37:08

Yeah, and like we described it as a shift like this is not a megaproject or undertaking. This is a gradual shift from chaos to control, as I sometimes describe it that we're trying to support. And when you were talking about copies, there reminded me that, you know, there are so many negative consequences, not just for privacy in that, but think about data portability, think about the right to be forgotten. The like, everything good, you can imagine about data collaboration becomes exponentially more difficult. And I would argue, in many cases, impossible with exponential copies. And so are we kidding ourselves, in that we're going to try to make these amazing outcomes, like portability from system A to system B, company A to Company C, while leaving 1000s of copies behind and what have we achieved there? And so it's, it's these things that are all these GDPR like outcomes, it's easy to find people. But you know, but they're not doing it. Like I said, for the most part, because they're evil. They're doing it because that's how technology is built. So let's find a new way called collaborative data collaboration. To build technologies without copies. Let's start building technology, including analytics, including systems, including automation, including AI ml algorithms different. Let's do it without copies. Let's do it through collaboration.


Debbie Reynolds  38:35

What thing, I will talk about the future now so what thing is on the horizon? That concerns you most, especially as it relates to privacy?


Chris McLellan  38:47

Well, I guess I think one of the biggest things is just it's so confusing. Like there's data, everything, right? So take something like data virtualization, and really well, that you could argue is making it's really a distributed query from many systems back into a new system for the purpose of analytics. Now, that doesn't generate copies. It's like holding up a mirror. Okay, sure, we can argue the semantics of that statement. But okay. But it’s not doing anything to eliminate the silos that exist. And it's doing nothing to eliminate the silos of the future. Do you know what I mean? But if you because you're still going to build your new app system or automation based on a new silo, that's it, and it's doing nothing to eliminate spreadsheets either. So then, what worries me most to your question is the obfuscation of the whole market of data management. Like it took me a couple of years, honestly, you know, I describe my journey from marketing into data. It's taken me years to get my head around the 100 terminal, you know, the vast lexicon of data. And if you go to one and every vendor’s website, it is claiming to solve the problem. Well, they're not; they’re solving the symptom to the problem. And that's okay. And that's why only a nonprofit and only, you know, a group of people sort of focusing around a common cause, I think, can really help sharpen the debate. And that's another thing we're trying to do. The problem has copies. And if a solution is not addressing the problem of copies, then they are treating the symptom. And there's nothing wrong with that. And then we need those people in those innovators. But let's not kid ourselves; we’re putting a bandaid on an increasingly difficult wound in global ID infrastructure. And that is called data integration that is called copies. And so that what's worries me is that the language is so completely confusing for people to see through what the actual solution is.


Debbie Reynolds  40:51

Wow, I like where you put that. Yeah. So as you're talking, I'm thinking, what we seem to have now are a bunch of tactics instead of strategy. So we need something at a higher level to talk about and discuss and figure out what's the best way to go forward, as opposed to, like you say, treating, you know, the symptom and giving someone a bandaid for a gunshot wound, right.


Chris McLellan  41:19

I was going to use that. That was the metaphor I was going to use, but as well sensitive that, but that is the right description, Debbie, thank you.


Debbie Reynolds  41:29

Thank you very much. Thank you very much. So if it were the world, according to Chris, and we did everything that you said, what would be your wish for privacy anywhere in the world, whether it's human technology, regulation, anything,


Chris McLellan  41:46

I suppose it's that I think we've established what we would like data to be, i.e., portable, controlled, you know, able to be deleted, when you leave a company or whatever. And I think I would like to see the shift away from finding innovators into oblivion. And which has its place to those who have in the intent of evil, and pivot more towards the positive, collaborative, unlimited potential untapped potential of working and collaborating on data, if we can only find a way to share it without losing agency, which is exactly what we're trying to do. So let's think about that. We think we're losing the purpose here and the potential. And so I often tell people about this thing I learned about a few years ago at Johns Hopkins University's in Baltimore, where they get they were able to get consent and access to I think that 1000 patients who had pancreatic cancer, and they were able to look at their search history, just their search history, I think going back three or four years, and they were able to use that data to train an algorithm to predict their condition, which is obviously about as serious as healthcare conditions can get. And it was able to predict it something like 12 months before they actually did it in the real world. Let's not forget that. That is just one problem. And one outcome came from one collaboration from 1000 people. Imagine what we can do if we could only find a way to collaborate on information for all the other things we mentioned, like all the other challenges of social inclusion, intelligent cities, precision, health care, sustainability, planting trees, getting a hold of data legislation, all of these things, you know, just imagine what could be done. And that's why it's so important. And I would like to see the world move a little bit. I think you and I discussed this the other day, and I would love to see a world where data privacy consultants become data collaboration, consultants, it's not. It's not about how can I protect your data? That should be a given? It's how about who should you decide to give your data access to what projects would you grant access to your information to and how do you make those decisions, ethically, and otherwise, that's what I would love to see as the data privacy consulting community evolved into a data collaboration community, and where ownership and control of information are given. And, and it's the collaboration and those decisions, ethical and otherwise, as I said, become the basis to focus on.


Debbie Reynolds  44:29

Wow, that's a big task. So we have a lot of work ahead of us, don't we?


Chris McLellan  44:33

indeed. But it's good work. There'll be a lot of it. Trust me. Oh, yeah,


Debbie Reynolds  44:39

Definitely. Definitely. So Well, thank you so much for being on the show. This was great. I am so glad that we were able to chat today you were able to give more detail about the Data Collaboration Alliance. I also want to give a shout-out to some of you all do some stellar content. So you guys actually, I've totally recommended that you go on to DataCollaboration.org, right. Yeah, that's right. Okay, I want to make sure I have that right. And check out the content that Chris and his team put out. So you all do, you know, you have podcasts you put out, you have a lot of video content that you put out. I like the little things that you put out as the data drop news with different articles or different things. And, you know, I do like, I do Google's research. So I'm constantly researching; you have been doing it for over 20 years. So you even find things that I haven't seen, that's hard to do. So the fact that you can do that is great. And you know, you are you keep up with a lot of developments that are happening around the world. So I highly recommend that people plug into the Data Collaboration Alliance, you know, check out the content, get involved, and we can, you know, these data sets they have available. And that's something that you need, go on to his website and be able to plug into that. And you know, definitely collaborators, you know, people who want to roll up their sleeves, and there is a project that interests you definitely get involved or contact me, or I'll put you in touch with Chris. And we can see if we can get people involved because it takes I don't want to say it takes a village. But it takes people you know; these are human problems, right. So it takes many different types of humans, not just, you know, scholars at universities, not just people, you know, speaking at conferences, or putting out white papers. So there are a lot of levels to this. And we need a lot of people with different talents, skills from all kinds of walks of life who are interested in data.


Chris McLellan  44:46

I'm thinking it takes a mindset of you know, as well, it's like, let's, you know, get roll up your sleeves, like you said, like this stuff isn't mysterious or vaporous data has a physical presence, and you can collaborate on it. And it's fun. And you'll be happy to know that the data-dropped podcast you mentioned that we put out all the data, all the stories, new stories we share are available as a dataset within the Node Zero environment as well. So that's just another example of everything. The data is the rock star, and our community is,  just fans of the potential of data and what can be done with it.


Debbie Reynolds  47:25

Excellent. This was a fantastic episode. Thank you so much again, and we'll chat soon.


Chris McLellan  47:31

Thank you, Debbie. And thank you for being a part of our community and such a great advisor to what we do. I love what you do. I watch your videos religiously. And so keep doing what you do.


Debbie Reynolds  47:43

I shall thank you so much.


Previous
Previous

E63 - Ron Hedges, Senior Counsel at Dentons and retired United States Magistrate Judge for the District of New Jersey

Next
Next

E61 - Karyn Bright, Chief Communications Consultant at Understanding Identity