Listen to Software Engineering Daily: https://softwareengineeringdaily.com/2021/04/07/chainlink-connecting-smart-contracts-to-external-data-with-sergey-nazarov/
Transcript
JM: Tell me a little bit more about the data sources for Chainlink. Like how do those
data sources get vetted and how does the data make its way onto the chain?
SN: Right, absolutely. So there're actually two approaches here and I think they're
both important and the flexibility of how you acquire data is important. The first approach is that
you have an oracle network and that oracle network is a collection of nodes that are incentivized
just like blockchain miners and Bitcoin miners are incentivized. Those nodes are incentivized to
go out and get accurate data in order to generate the most accurate, highly reliable result
possible.
In the first version of how data is put into a smart contract, this oracle network of anywhere from
seven to over 30 nodes basically goes to an API at a data provider that is considered a highquality data provider. Often that's determined by users. So users will say, “Hey, we want that data provider.” Chainlink also has a reputation system where we track how well each node, and even more and more now how each data provider is performing. And so better data providers
get to continue selling their data to Chainlink networks, whereas worst data providers are kind of
not as used by node operators because they're either not responsive or not returning the right
results. And so there's actually a reputation system baked into Chainlink, and it's quite
fascinating because the system inherently puts all of the data on chain and generates a lot of
proof about what's going on with the oracles.
In any case, in the first variant of the system you can go to any data provider, you can go to
really any API in the world and you can request from it and you can come to consensus on the
data from that source assuming you can get other sources or you can come to some model of
consensus that the user wants around that data. And that doesn't require the data provider to do
anything, right? So the benefit of this system is that you have a layer of consensus and you
have a lot of proof that the data was acquired from a data provider and the data providers don't
need to change anything about their infrastructure, right? So the data providers just continue to
provide their APIs, operate the way they have always operated and just do what they're
supposed to be doing. This is the system through which a good amount of the data is acquired
and then the data providers are more than happy to sell their data to Chainlink nodes because
it's consumed into these applications which they're all excited about.
The second version is when a data provider runs their own Chainlink node. And what that
basically means is the data provider gets a lightweight signing appliance. They basically get a
lightweight signing application that allows them to connect their APIs internally to their own
official node. And then that node publishes a contract on-chain, and that on-chain contract is a
representation of that data provider. So now there's an on-chain contract that's the
representation of that data providers services. And that on-chain contract gets requests from
other smart contracts for data to be given to them because, once again, a blockchain cannot
talk to an API. A blockchain has to have an oracle to speak with any API in the outside realworld.
And so the second variant is where data providers that are more interested in kind of selling
their data to the blockchain ecosystem or more convinced about that, and we have many data
providers already doing this live. We have data for sports events, weather events, market
events, all kinds of things out in the real-world already live on production with data providers
running their own production nodes. This variant allows you to get data essentially directly from
an official node run by a data provider. It has the benefits of getting data directly from a data
provider running their own node. It has the limitation in that the data provider now has to be able
to make sure that they are properly connected, that their APIs stay up according to the node and
all these other kinds of nuances. The benefit that they get is they are connected to many
different chains all at once. And in reality this variant basically requires the data provider to want
to opt-in to some kind of infrastructure. It requires them to want to say that, “Hey, I want to kind
of run a function in the cloud or I want to run some kind of node myself and I want to make a
technical investment in that.”
What we found so far is that the majority of data providers just want to sell their data to
somebody and they want to provide that to an oracle network that just retrieves their data and
sells that data successfully to a smart contract. There are some data providers that want to run
their own node and we're working with a lot of those, but I think that's something that's going to
evolve more slowly.
[00:16:33] JM: You mentioned this reputation system for how data gets verified as quality. How
does that reputation system work? How do you vet and ensure quality data?
[00:16:45] SN: So once again there's two levels. There's one level of the node operators and
assuring that they're operating properly and then there's the level of the data providers
responding properly. In terms of the node operators, the way that the Chainlink system works is
that node operators are committing to certain service level commitments, right? They're
basically, in many cases, on-chain committing to a certain degree of service. And they're
committing to that because the on-chain activity that they do is immediately public to everybody
as soon as it happens.
So I think the big nuance difference between a reputation system in the web world and a
reputation system in the blockchain world is that data is immediately available publicly. It is
immediately available for people to know that a node did not respond for a certain period of
time. And that lack of response is recorded on-chain immutably for everybody to analyze. And
we actually have multiple ecosystem teams. We have multiple kind of block explorer-like things
and marketplaces that are all able to analyze the same data about both node operators and
data providers.
So basically the way that it looks is that the node operators are expected to perform to a certain
degree on-chain. Those expectations are clear. They are then able to perform, or in some cases
if they're not able to perform, they are not able to stay on that oracle network. And then the data
providers themselves, for the ones that run their own nodes, it becomes pretty clear what their
responses, are and if their responses are often wrong, then you know once again that data
provider and their node might not be used in an aggregation. They might not be applied to that
aggregation.
In the cases where a node operator gets data from a data source, a lot of that data is actually
more internal to the oracle network and that data is something that's in the process of getting
published on chain. So there is a certain amount of insight that node operators have about the
responsiveness of different data providers and different data sources. At this point the reputation
system extends to node operators and to the node operators that are data sources. It will
continue and is already being extended to cover data provi...
Create your
podcast in
minutes
It is Free