Introduction to Zero-Copy Integration

@tags:: #lit✍/📰️article/highlights
@links::
@ref:: Introduction to Zero-Copy Integration
@author:: Data Collaboration Alliance

=this.file.name

Book cover of "Introduction to Zero-Copy Integration"

Reference

Notes

Quote

(highlight:: So what you're seeing on the on the screen here is really showing a a pattern that's been the case for close to 40 years now, which is what you're seeing here, is an application.
And, you know, you've probably heard the old expression, an app for everything. And the other part of that is a database for every app. So what you're seeing here is an application, an application database and tables of data within the application database. And this is deliberately very simple and iconic. And we'll be referring back to these icons continually as we go through this presentation. But a couple of things to note that so that's how we, broadly speaking, design develop applications.
And what's happened over the last few decades and more and more every year is that is organizations realize the importance and value of data. They've started to want to look for ways to bring it back together, to basically silo it or fragment it from these application specific databases. And generally speaking, that's done through a process known as copy based data integration. And it does what it sounds like it exchanges copies between applications
that uses technologies many of you will be familiar with, such as APIs and Utils. But in its essence, what this is about is exchanging copies. And what you're seeing here is the exchange of copies between application one and application two. But the reality is, let's say if this was your CRM application, the number of application integrations that this application is undergoes is dozens, even hundreds of them.
And so what's the problem with that? The issue is that this process is very expensive. It takes a lot of time. And probably worst of all, it's very rigid. So any time you want to make a change to this relationship between application one and two in terms of that integration, it's going to require someone you're really from an I.T. team to take take out time and added to their to do list and get around to it and make that change. So it's very brittle. It's very slow, it's very expensive, and it's getting increasingly more expensive.
For those of you who are coming at this from a data privacy, data protection, data governance point of view. I think the thing to note is that just like money or intellectual property or even physical identity, it's it's common sense that it's very difficult to protect and control things that you copy at scale. And so really, these are this is a this is an increasing and growing problem that's of deep concern, not only to the ability for organizations
to be resilient and develop digital solutions quickly, but also for their ability to to undergo data compliance and ensure data privacy. Hey, Chris, just one thing to to add to that. Yeah, absolutely. I think back even to my experience when I used to work in large financial institutions, you know, I worked at organizations that had 10,000 plus applications. Some were new, some are old. And those 10,000 applications didn't equal 10,000 integrations
that equaled hundreds of thousands of integrations and, you know, copies of data everywhere to the point where not only did it impede the ability to get intelligence, but as data privacy regulations start to finally kick in, you're getting control over that data became more than just an efficiency play. It became an actual mandatory requirement. But it's near impossible. And I don't think one thing that the standard does is it recognized that people weren't doing it this way because they wanted it to. This is just how it worked.
No one wants to create copies of data. No one wants to lose control over their data. No one wants to spend half of their budget moving data around inside of their own organization. But they just really didn't have a choice. But technology has evolved, and that's where the standard comes in, is to help organizations begin their journey of reversing that trend.)
- View Highlight
- zero-copy integration,
- [note::What zero-copy data integration is]

Quote

(highlight:: if you just imagine the power grids that decouples producers of energy from consumers of energy, and some could be net producers, some could be net consumers, some can be just a consumer, some can be just a producer. And I can put solar panels on my roof and self power, but then generate surplus and do that back into the grid. And when I'm short, I can draw down from the grid.
It's that decoupling that's enable buildings to be constructed without needing to build a power plant inside of every single home and enable scale and management of complexity. Otherwise we wouldn't be what it was. This is extending that same paradigm for the world of information management. So we have to start to recognize that the concept of a customer cannot be owned by a single application or a collection of software code. It is a universal concept that spans all software, all applications
and and that's the core aspect here.)
- View Highlight
-
- [note::Analogy for 1st principle of zero-copy integration: decouple data from applications]

Quote

(highlight:: if you've ever used any collaboration technology
like a Google Drive or a SharePoint or box or anything that allows you to create a document, give access to other parties, maybe it's read access, maybe it's change access. That access can be granted to be granted temporarily. It can be taken away. It has the full traceability and auditability and and simply revoking that access as an alternative to sending copies back and forth of email attachments, trying to work together on document. This is really taking that same paradigm and extending it to the world
of data, right? So this collaborative approach that ultimately replaces that copy paste integration approach, so it's more efficient, it's more scalable, it's simpler. And this is really what enables the owner of data to actually have control over that, because they can control those access policies, they can grant, they can revoke, they can change those. Whereas if they send a copy over an API or an ETL, there's no way.
Let's see, once the copies of the door are, you've lost control. So I want to go back to a comment that Frank had, which is giving users and in this case, let's talk about users who ultimately own data, the ability to have that control. This is the way to make that happen. Is is giving the rightful owners of data the ability to have control over that by defining those policies.)
- View Highlight
-
- [note::Analogy for 2nd principle of zero-copy integration: Access-based collaboration over copy-based collaboration]

Quote

(highlight:: Yeah, this definitely reminds me of the earlier
days of Agile software development for any of you who have experience. And in that type of an environment, which really was a reframe of how you think of software to be product based rather than, you know, it's software. All software is the same is is there software for managing employee engagement, there's software for doing these things. And, and each of these needs to be treated like a product or it has a product owner and and someone who cares about the experience of its users and so on and so forth.
And this is extending that same product based approach to the world of information management to basically enable the governance to scale, quite frankly, rather than it being in a in a collaborative ecosystem, you now have to worry about what data you have control over and being able to organize that in these products and rolling it up into domains is a critical enablement for that federation.)
- View Highlight
-
- [note::3rd principle of zero-copy integration: govern and manage data as a product (like we do with software/apps)]

Quote

(highlight:: So for the purposes of this presentation, we divided the controls by type. This isn't an exhaustive list, but an indicative one. So the controls like who can view data, who can edit data,
who can delete data, who can query data? And I think one that often gets overlooked is the handing over the control of the controls to someone else, which is how you unlock custodianship. And and I think when you think about that, it would always be temporary. So the data owner would only ever be granting control of the controls on a temporary basis. And they can always revoke those access grades. In terms of granularity, we're talking about controls set at the table level, at the column level, at the row level, and even right down to the cell
to speak to a damage just mentioning about granularity of access controls. And then in terms of the sort of who we're talking about when we talk about this, to whom are the controls granted? We're talking about other users within the digital ecosystem. We're referring to groups of users. Now, these could be operational teams like finance or marketing development, but could also be things like branch locations or location based teams. And that becomes very relevant
if they're if they're working in different jurisdictions, like different data protection, regulatory regimes and regulation. So that becomes very important to be able to set by groups. And then finally, you can also set these these grants to provide systems access to data as well. So it's not all 100 and it's not purely people. But I think as far as the frameworks concerned, what you're seeing is there's access controls that can be defined once at the data layer, not in application code.
And this applies to both data and metadata. And what this does is avoid application specific or code based controls that are easily fragmented as you proliferate the number of apps you have in your ecosystem. So then what do you what do you think about the data layer, level access controls? Yeah. No, I think it's that conceptually anyways, it's that simple, which is if you picture a piece of data, maybe it's information about a year in the health care industry and as information about a patient
or finance and is information about a customer or information about yourself is you wouldn't want the ability for individuals to access that to be determined based on the application that they're interfacing with. It should be universally enforced and contextually aware. So it's really just ensuring that the controls are embedded in the data itself, such that regardless of how one interacts with that, whether it's a metadata driven experience or application one or application two or application
222, if I don't have access, I don't have access. If I have access, I have access. And it should be ultimately that simple and not application dependent, which is how it is in the traditional approach.)
- View Highlight
-
- [note::4th principle of zero-copy integration: Enforce controls at the data layer]


dg-publish: true
created: 2024-07-01
modified: 2024-07-01
title: Introduction to Zero-Copy Integration
source: reader

@tags:: #lit✍/📰️article/highlights
@links::
@ref:: Introduction to Zero-Copy Integration
@author:: Data Collaboration Alliance

=this.file.name

Book cover of "Introduction to Zero-Copy Integration"

Reference

Notes

Quote

(highlight:: So what you're seeing on the on the screen here is really showing a a pattern that's been the case for close to 40 years now, which is what you're seeing here, is an application.
And, you know, you've probably heard the old expression, an app for everything. And the other part of that is a database for every app. So what you're seeing here is an application, an application database and tables of data within the application database. And this is deliberately very simple and iconic. And we'll be referring back to these icons continually as we go through this presentation. But a couple of things to note that so that's how we, broadly speaking, design develop applications.
And what's happened over the last few decades and more and more every year is that is organizations realize the importance and value of data. They've started to want to look for ways to bring it back together, to basically silo it or fragment it from these application specific databases. And generally speaking, that's done through a process known as copy based data integration. And it does what it sounds like it exchanges copies between applications
that uses technologies many of you will be familiar with, such as APIs and Utils. But in its essence, what this is about is exchanging copies. And what you're seeing here is the exchange of copies between application one and application two. But the reality is, let's say if this was your CRM application, the number of application integrations that this application is undergoes is dozens, even hundreds of them.
And so what's the problem with that? The issue is that this process is very expensive. It takes a lot of time. And probably worst of all, it's very rigid. So any time you want to make a change to this relationship between application one and two in terms of that integration, it's going to require someone you're really from an I.T. team to take take out time and added to their to do list and get around to it and make that change. So it's very brittle. It's very slow, it's very expensive, and it's getting increasingly more expensive.
For those of you who are coming at this from a data privacy, data protection, data governance point of view. I think the thing to note is that just like money or intellectual property or even physical identity, it's it's common sense that it's very difficult to protect and control things that you copy at scale. And so really, these are this is a this is an increasing and growing problem that's of deep concern, not only to the ability for organizations
to be resilient and develop digital solutions quickly, but also for their ability to to undergo data compliance and ensure data privacy. Hey, Chris, just one thing to to add to that. Yeah, absolutely. I think back even to my experience when I used to work in large financial institutions, you know, I worked at organizations that had 10,000 plus applications. Some were new, some are old. And those 10,000 applications didn't equal 10,000 integrations
that equaled hundreds of thousands of integrations and, you know, copies of data everywhere to the point where not only did it impede the ability to get intelligence, but as data privacy regulations start to finally kick in, you're getting control over that data became more than just an efficiency play. It became an actual mandatory requirement. But it's near impossible. And I don't think one thing that the standard does is it recognized that people weren't doing it this way because they wanted it to. This is just how it worked.
No one wants to create copies of data. No one wants to lose control over their data. No one wants to spend half of their budget moving data around inside of their own organization. But they just really didn't have a choice. But technology has evolved, and that's where the standard comes in, is to help organizations begin their journey of reversing that trend.)
- View Highlight
- zero-copy integration,
- [note::What zero-copy data integration is]

Quote

(highlight:: if you just imagine the power grids that decouples producers of energy from consumers of energy, and some could be net producers, some could be net consumers, some can be just a consumer, some can be just a producer. And I can put solar panels on my roof and self power, but then generate surplus and do that back into the grid. And when I'm short, I can draw down from the grid.
It's that decoupling that's enable buildings to be constructed without needing to build a power plant inside of every single home and enable scale and management of complexity. Otherwise we wouldn't be what it was. This is extending that same paradigm for the world of information management. So we have to start to recognize that the concept of a customer cannot be owned by a single application or a collection of software code. It is a universal concept that spans all software, all applications
and and that's the core aspect here.)
- View Highlight
-
- [note::Analogy for 1st principle of zero-copy integration: decouple data from applications]

Quote

(highlight:: if you've ever used any collaboration technology
like a Google Drive or a SharePoint or box or anything that allows you to create a document, give access to other parties, maybe it's read access, maybe it's change access. That access can be granted to be granted temporarily. It can be taken away. It has the full traceability and auditability and and simply revoking that access as an alternative to sending copies back and forth of email attachments, trying to work together on document. This is really taking that same paradigm and extending it to the world
of data, right? So this collaborative approach that ultimately replaces that copy paste integration approach, so it's more efficient, it's more scalable, it's simpler. And this is really what enables the owner of data to actually have control over that, because they can control those access policies, they can grant, they can revoke, they can change those. Whereas if they send a copy over an API or an ETL, there's no way.
Let's see, once the copies of the door are, you've lost control. So I want to go back to a comment that Frank had, which is giving users and in this case, let's talk about users who ultimately own data, the ability to have that control. This is the way to make that happen. Is is giving the rightful owners of data the ability to have control over that by defining those policies.)
- View Highlight
-
- [note::Analogy for 2nd principle of zero-copy integration: Access-based collaboration over copy-based collaboration]

Quote

(highlight:: Yeah, this definitely reminds me of the earlier
days of Agile software development for any of you who have experience. And in that type of an environment, which really was a reframe of how you think of software to be product based rather than, you know, it's software. All software is the same is is there software for managing employee engagement, there's software for doing these things. And, and each of these needs to be treated like a product or it has a product owner and and someone who cares about the experience of its users and so on and so forth.
And this is extending that same product based approach to the world of information management to basically enable the governance to scale, quite frankly, rather than it being in a in a collaborative ecosystem, you now have to worry about what data you have control over and being able to organize that in these products and rolling it up into domains is a critical enablement for that federation.)
- View Highlight
-
- [note::3rd principle of zero-copy integration: govern and manage data as a product (like we do with software/apps)]

Quote

(highlight:: So for the purposes of this presentation, we divided the controls by type. This isn't an exhaustive list, but an indicative one. So the controls like who can view data, who can edit data,
who can delete data, who can query data? And I think one that often gets overlooked is the handing over the control of the controls to someone else, which is how you unlock custodianship. And and I think when you think about that, it would always be temporary. So the data owner would only ever be granting control of the controls on a temporary basis. And they can always revoke those access grades. In terms of granularity, we're talking about controls set at the table level, at the column level, at the row level, and even right down to the cell
to speak to a damage just mentioning about granularity of access controls. And then in terms of the sort of who we're talking about when we talk about this, to whom are the controls granted? We're talking about other users within the digital ecosystem. We're referring to groups of users. Now, these could be operational teams like finance or marketing development, but could also be things like branch locations or location based teams. And that becomes very relevant
if they're if they're working in different jurisdictions, like different data protection, regulatory regimes and regulation. So that becomes very important to be able to set by groups. And then finally, you can also set these these grants to provide systems access to data as well. So it's not all 100 and it's not purely people. But I think as far as the frameworks concerned, what you're seeing is there's access controls that can be defined once at the data layer, not in application code.
And this applies to both data and metadata. And what this does is avoid application specific or code based controls that are easily fragmented as you proliferate the number of apps you have in your ecosystem. So then what do you what do you think about the data layer, level access controls? Yeah. No, I think it's that conceptually anyways, it's that simple, which is if you picture a piece of data, maybe it's information about a year in the health care industry and as information about a patient
or finance and is information about a customer or information about yourself is you wouldn't want the ability for individuals to access that to be determined based on the application that they're interfacing with. It should be universally enforced and contextually aware. So it's really just ensuring that the controls are embedded in the data itself, such that regardless of how one interacts with that, whether it's a metadata driven experience or application one or application two or application
222, if I don't have access, I don't have access. If I have access, I have access. And it should be ultimately that simple and not application dependent, which is how it is in the traditional approach.)
- View Highlight
-
- [note::4th principle of zero-copy integration: Enforce controls at the data layer]