[Tech] Separation of Storage and Compute (Databases) - Nikita Shamgunov
Listen to the Changelog: https://changelog.com/podcast/510
Transcript
So elastic compute makes sense, and scaling down because you have like ephemeral on-demand resource usage, right? Like, all of a sudden, I have to answer a bunch of HTTP requests, and so my server has to do stuff, and then everybody leaves, and my website doesn’t get any hits, and I could scale that down. With databases, if I’ve got a one-gigabyte database, it’s just like, it’s always there. I mean, all that data is there, and I could access any part of it at any time, or I need to… And we don’t know which parts. So I have a hard time with database scaling to zero, unless you’re – I don’t know, just like stomaching the cost… Or tell us how that works with Neon. Are you just stomaching the costs of keeping that online, or are you actually scaling it down?
NIKITA SHAMGUNOV
We’re actually scaling that down. Let me explain how this works, and it may get quite technical. The first thing is what should be the enabling technology of scaling that down? If you’re just kind of thinking, “How would I build serverless Postgres?” and if you ask a person that is not familiar with database internals, they would say something like, “Well, I would put it in the VM maybe, or I would put it in the container, I would put that stuff into Kubernetes… Maybe I can change the size of the containers…” The issue with all that, as you start moving those containers around, it will start breaking connections, because databases like to have a persistent connection to them. And then you will be impacting your cache. Databases like to have a working set in memory, and if you don’t have a working set of memory, you’re paying the performance hit by bringing that data from cold storage to memory.
The third thing that you will find out, that if the database is large enough, it’s really, really hard to move database from host to host, because that involves data transfer, and data transfers are just long and expensive. And now you need to do it live, while the application is running and hitting the system. And so naively, you would arrive with something that you kind of proposed, like just stomach the costs. There is a better approach, though… And the better approach starts with an architectural change of separating of storage and compute.
If you look at how databases, storage works at the high level, it’s what is called a page-based storage; all the data in the database is split into 9-kilobyte pages. And the storage subsystem basically reads and writes those pages from disk, and caches those pages in memory. And then, kind of the upper-level system in the database lays out data on pages.
So now you can separate that storage subsystem, and move that storage subsystem away from Compute into a cloud service. And because that storage subsystem operates is relatively simple from the API standpoint - the API is “read a page, write into a page”, then you can make that part multi-tenant. And so now you start amortizing costs across all your clients. So if you make that multi-tenant, and you make that distributed, and distribute key-value stores - you know, we’ve been building them forever, so it’s not rocket science anymore - then you can make that key-value store very, very efficient, including being cost efficient. And cost efficiency comes from taking some of that data that’s stored there and offloading cold data into S3.
[20:13] Now, then it leaves out compute. And compute is the SQL query processor, and caching. So that, you can put in a VM. We actually started with containers, but we quickly realized that micro VMs such as Firecracker or Cloud-hypervisor is the right answer here. And those micro VMs have very, very nice properties to them. First of all, we can scale them to zero, and preserve the state. And they come back up really, really quickly. And so that allows to us to even preserve caches, if we shut that down.
The second thing that allows us to do is live-changing the amount of CPU and RAM we’re allocating to the VM. That’s where it gets really tricky, because we need to modify Postgres as well, to be able to adjust to suddenly you have more memory, or shrink down to “Oh, all of a sudden, I have less memory now.” And so if you all of a sudden have less memory, you need to release some of the caches, and release this memory into the operating system, and then we change the amount of memory available to the VM. And there’s a lot of cool technology there, with live-changing the amount of CPU, and there’s another one that’s called memory ballooning, that allows you to, at the end of the day, adjust the amount of memory available to Postgres.
And then you can live-migrate VMs from host to host. Obviously, if you put multiple VMs on a host, they all started growing, at some point, you don’t have enough space on the host. Now you do make a decision - which ones do you want to remove from the host? Maybe you have a brand new hosts available for them, with the space… But there is an application running, with a TCP connection, hitting that system> Storage is separate, so you only need to move the compute. And so now you’re not moving terabytes of data with moving Postgres, you’re just moving the compute part, which is really the caches, and caches only. But you need to perform a live migration here. So that’s what we’re doing with this technology that’s called Cloud Hypervisor, that supports live migrations. And the coolest part is, as you’re performing the live migration, you’re not even terminating the TCP connection. So you can have the workload keep hitting the system as you change the size of the VM for the computer up and down, as well as you can change the host for that VM, and the application just keeps running… So yeah, that’s kind of super-exciting technology.
JEROD SANTO
So do you have your own infrastructure that this is running on, or are you on top of a public cloud, or how does that all work?
NIKITA SHAMGUNOV
So we are on top of AWS. We know that we need to be on every public cl...
Create your
podcast in
minutes
It is Free