Video
Striveworks and Neural Magic Partner to Disappear MLOps
Discover how seamless AI workflows and efficient resource management are transforming model deployment across cloud, on-prem, and edge environments.
Is MLOps Disappearing? (Featuring Striveworks)
Transcript
(Automated transcription)
Jay Marshall:
So, starting with something that's sparse and then training in a way to preserve the sparsity. And then the data scientists, you know, selects their hyperparameters, what's the batch size they want, what optimizer sampling strategy, data set, obviously, to train on, and so forth.
Hey, everybody. I'm Jay Marshall with Neural Magic. Today I'm joined by Eric Korman, one of the cofounders of Striveworks, another great startup that we've partnered with earlier on this year. Thanks for joining me today, Eric.
Eric Korman:
Yeah, thanks for having me.
Jay Marshall:
So, you know, on Neural Magic we talked a lot lately about what we call, kind of, the three F's. So, fast, frictionless, flexible. And, for us, we're usually talking very specifically about optimizing models and then running those models at a very efficient way on regular CPU architectures. I know for you all at Striveworks, and Chariot specifically, you're kind of doing those same value props, but extending it across the entire life cycle. So why don't you take a second and maybe tell our audience, here, a little bit more about what you all do at Striveworks.
Eric Korman:
Yeah, sure. So, our core offering is Chariot, which is an end-to-end MLOps platform. And it's especially well-suited for deep learning.
So, think, tasks and computer vision like object detection or image classification, and also natural language processing tasks. And it takes you at a very low-code, no-code way through the whole model development life cycle, from getting data into the system centrally stored and versioned, to getting annotated for supervised training tasks, to training models on that data, to then, cataloging, deploying, monitoring, and refining those models. And so, with a very strong emphasis on your second 'F' of 'friction' and making that, you know, to us frictionless is the idea of kind of 'Disappearing MLOps.' And all that infrastructure piece is kind of, you know, under the hood and data scientists don't need to worry about, you know, setting up device drivers and getting their libraries in order. They can just, in a very declarative way, you know, say how they want to train something in terms of, you know, model architecture, data sets, and that kicks off this whole, sort of, process-as-code.
Jay Marshall:
Yeah, and I love that process-as-code term. I know for myself, my background, kind of enterprise architecture/cloud architectures with some of the mega Cloud providers, you know, that whole kind of explosion in the 2010s around infrastructure-as-code and obviously CI/CD and DevOps. I love that 'Disappear MLOps' tagline.
Why don't you, maybe, share some of those specific challenges or even examples that you've seen with folks in terms of helping that happen, because that sounds like it'd be really exciting for folks in this space. Especially right now.
Eric Korman:
Yeah, definitely. And it's definitely had a lot of, you know, engineering challenges on our side to make this happen, make this disappearance happen. And, you know, one of the biggest issues is actually resource management.
So, one of the benefits of our platform is the diverse amount of environments that can be deployed to. So, yeah, we deploy to the, you know, major Cloud providers, but we also can deploy on-prem. We have a lot of customers that are interested in that because, you know, their data is sensitive, for example. And so, there, you know, you're dealing with a very finite number of resources and so an easy way to have MLOps not be disappeared—be very apparent—is if, you know, your model can't train because there's no GPUs available, right? And so, you know, what's key for us in
this partnership is that we can offload all of model inferences to CPU, which Neural Magic's software allows us to do. Then we can reserve GPUs for things, just training mostly.
And so that, you know, really efficient use of resources really helps in the disappearing of MLOps.
Jay Marshall:
And again, this is also why we were looking forward to doing kind of this short but sweet video because I think a lot of times when we talk about, you know, getting the performance that we get on CPUs it's really not about not having GPUs, but it is again that flexibility and the ability, whether you're on the public Cloud in your private data center, as you say, at the Edge. That anywhere where there's x86 or ARM compute, you know, you can squeeze that performance out and get to run it.
And so love the fact that you're automating. again, the rest of that stack. Maybe you can give a quick example what that end-to-end looks like. So, you know, we have these optimized models that we do here at Neural Magic and offer up in what we call our SparseZoo, maybe, like, how does that kind of show itself up in Striveworks? Maybe how does that work?
Eric Korman:
Yeah, so, you know, the standard process of of training a model in a way that can be deployed by Neural Magic's software... it begins with, you know, in a very declarative way, the data scientist is just saying, you know, what they want to happen. So, there's a drop down; they select what architecture to use. These are all architectures available through your guy's SparseML library.
And then clicking, you know, go/start, that starts the process. And so the system will go see what available compute there is to run the training job, go ahead train that model, you know, metrics, live training loss. Things like that are reported back to the user so they can monitor the progress of the training process. And then, once the data scientist is satisfied, you know, with the model checkpoint, they can elevate that to Chariot's Model Catalog. And so, that's now usable as an inference endpoint where you can just post data to, and the main mechanism we look to deploy that through is through Neural Magic, namely the DeepSparse runtime so that will be deployed on CPUs without having to sacrifice any inference speed.
Jay Marshall:
So, I know we're going to be doing a lot of this stuff over the next, you know, upcoming months and quarters. I know for both of us, we're doing things as it pertains to Edge device compute, obviously LLMs are all the rage. So looking forward to doing a lot more work together in the upcoming months and quarters
Eric Korman:
Yeah, likewise.
Jay Marshall:
Thanks so much again for joining us again today. Reach out to us www.neuralmagic.com or www.striveworks.com to keep up with everything we're doing together. Thank so much.
Video Summary
In this video, Jay Marshall of Neural Magic interviews Eric Korman, cofounder of Striveworks, to discuss their collaboration and shared focus on efficient AI model deployment. They highlight Striveworks Chariot, an end-to-end MLOps solution designed for deep learning tasks like computer vision and natural language processing. The incorporation of Neural Magic into Chariot means data scientists can offload all model inferences to CPU, reserving GPUs for training. This resource efficiency is another step toward the goal of “disappearing MLOps.”The Striveworks MLOps Ecosystem
Striveworks has relationships and deep integrations with many of the best organizations in the technology landscape. See our MLOps ecosystem.
Related Resources
Google Cloud C3D VMs Powered by AMD Make AI Accessible for Everyone
Watch Striveworks cofounder Eric Korman discuss Valor, the first-of-its-kind, open-source AI model evaluation service.
Striveworks and Neural Magic Mentioned in AMD's Advancing AI 2024
Striveworks and Neural Magic are running inference workloads on CPUs, saving money without sacrificing speed.