David's raw ML reference notes

❯

❯

MLOps and LLMOps

❯

Seldon Core

Feb 14, 20252 min read

From tutorial video

Allegedly framework-agnostic ML serving tool for Kubernetes
Integrates with Kubernetes so that deploying an ML model is basically the same as deploying a Kubernetes service
Deployed models get three out-of-the-box components:
- gRPC serving
- HTTP serving
- Interactive (Swagger) UI
Incorporates routing capabilities (shadowing, A/B tests)
Depends on a 3rd party ingress controller
- Istio
- Ambassador

Quickstart

Not just for Python
- Has wrappers for Java, R, NodeJS * Provides an image building tool called s2i (for non-reusable servers)
Exposes out-of-the-box metrics for Prometheus

Overview of components

Two types of model servers:
- Reusable / prepackages servers: use to serve a family of similar models. Programatically retrieves the model itself from cloud storage at runtime.
- Non-reusable servers: model is packaged with Seldon in single image.
There are pre-built servers for major frameworks (e.g. xgb, tf, hf, skl)
- If you don’t need to do any custom input transformation, you can deploy these with just a manifest .yaml
SeldonDeployment Kubernetes CRD facilitates configuration and management of models
Benefit of Seldon over e.g. Flask:
- Out-of-the-box configurable K8s deployments (manifests, ingress, etc.)
- Parametrizable, reusable containers
- Portability to other platforms
- Integrations
- Inference graphs

TODO: Model metadata

TODO: Orchestration (1, 2, 3)

Integrations

Logging

Configure external logger in manifest
Can configure input and output HTTP payloads to be published as CloudEvents
Can also publish logs to Kafka for downstream digestion

Batch processing

Define a batch process on the command line
- Input and output data paths
- Seldon deployment name
- Number of workers
It will run all the input data through the model and serialize the results

Benchmarking (1, 2)

Metrics are exposed by the service orchestrator
By default, you can get RPS
- Documentation claims additional metrics, but more info link is to self
- Notebook doesn’t show them, either
- Custom metrics are created by including a metrics method in the custom Python (or whatever language) wrapper
- They are then accessed either via HTTP (on port 6000 by default) or directly in Kubernetes via Prometheus.
  - Behind the scenes, Prometheus is still using HTTP; it just scrapes the ports and provides a dashboard

Native Kafka integration

All you need to do is choose serverType: Kafka (plus input and output topics) in deployment config
Seldon will poll the input topic and publish to the output topic

TODO: Python wrapper (1, 2)

What’s the business model?

They sell a managed version of Seldon Core called Seldon Deploy

Graph View

From tutorial video
Quickstart
Overview of components
TODO: Model metadata
TODO: Orchestration (1, 2, 3)
Integrations
Logging
Batch processing
Benchmarking (1, 2)
Native Kafka integration
TODO: Python wrapper (1, 2)
What’s the business model?

Backlinks

No backlinks found

Created with Quartz v4.4.0 © 2025

Terms of Use
LinkedIn
Buy me a coffee