From tutorial video
- Allegedly framework-agnostic ML serving tool for Kubernetes
- Integrates with Kubernetes so that deploying an ML model is basically the same as deploying a Kubernetes service
- Deployed models get three out-of-the-box components:
- gRPC serving
- HTTP serving
- Interactive (Swagger) UI
- Incorporates routing capabilities (shadowing, A/B tests)
- Depends on a 3rd party ingress controller
- Istio
- Ambassador
Quickstart
- Not just for Python
- Has wrappers for Java, R, NodeJS * Provides an image building tool called s2i (for non-reusable servers)
- Exposes out-of-the-box metrics for Prometheus
Overview of components
- Two types of model servers:
- Reusable / prepackages servers: use to serve a family of similar models. Programatically retrieves the model itself from cloud storage at runtime.
- Non-reusable servers: model is packaged with Seldon in single image.
- There are pre-built servers for major frameworks (e.g. xgb, tf, hf, skl)
- If you don’t need to do any custom input transformation, you can deploy these with just a manifest .yaml
SeldonDeploymentKubernetes CRD facilitates configuration and management of models- Benefit of Seldon over e.g. Flask:
- Out-of-the-box configurable K8s deployments (manifests, ingress, etc.)
- Parametrizable, reusable containers
- Portability to other platforms
- Integrations
- Inference graphs
TODO: Model metadata
TODO: Orchestration (1, 2, 3)
Integrations
Logging
- Configure external logger in manifest
- Can configure input and output HTTP payloads to be published as CloudEvents
- Can also publish logs to Kafka for downstream digestion
Batch processing
- Define a batch process on the command line
- Input and output data paths
- Seldon deployment name
- Number of workers
- It will run all the input data through the model and serialize the results
Benchmarking (1, 2)
- Metrics are exposed by the service orchestrator
- By default, you can get RPS
- Documentation claims additional metrics, but more info link is to self
- Notebook doesn’t show them, either
- Custom metrics are created by including a
metricsmethod in the custom Python (or whatever language) wrapper - They are then accessed either via HTTP (on port 6000 by default) or directly in Kubernetes via Prometheus.
- Behind the scenes, Prometheus is still using HTTP; it just scrapes the ports and provides a dashboard
Native Kafka integration
- All you need to do is choose
serverType: Kafka(plus input and output topics) in deployment config - Seldon will poll the input topic and publish to the output topic
TODO: Python wrapper (1, 2)
What’s the business model?
They sell a managed version of Seldon Core called Seldon Deploy