From tutorial video

  • Allegedly framework-agnostic ML serving tool for Kubernetes
  • Integrates with Kubernetes so that deploying an ML model is basically the same as deploying a Kubernetes service
  • Deployed models get three out-of-the-box components:
    • gRPC serving
    • HTTP serving
    • Interactive (Swagger) UI
  • Incorporates routing capabilities (shadowing, A/B tests)
  • Depends on a 3rd party ingress controller
    • Istio
    • Ambassador

Quickstart

  • Not just for Python
    • Has wrappers for Java, R, NodeJS * Provides an image building tool called s2i (for non-reusable servers)
  • Exposes out-of-the-box metrics for Prometheus

Overview of components

  • Two types of model servers:
    • Reusable / prepackages servers: use to serve a family of similar models. Programatically retrieves the model itself from cloud storage at runtime.
    • Non-reusable servers: model is packaged with Seldon in single image.
  • There are pre-built servers for major frameworks (e.g. xgb, tf, hf, skl)
    • If you don’t need to do any custom input transformation, you can deploy these with just a manifest .yaml
  • SeldonDeployment Kubernetes CRD facilitates configuration and management of models
  • Benefit of Seldon over e.g. Flask:
    • Out-of-the-box configurable K8s deployments (manifests, ingress, etc.)
    • Parametrizable, reusable containers
    • Portability to other platforms
    • Integrations
    • Inference graphs

TODO: Model metadata

TODO: Orchestration (1, 2, 3)

Integrations

Logging

  • Configure external logger in manifest
  • Can configure input and output HTTP payloads to be published as CloudEvents
  • Can also publish logs to Kafka for downstream digestion

Batch processing

  • Define a batch process on the command line
    • Input and output data paths
    • Seldon deployment name
    • Number of workers
  • It will run all the input data through the model and serialize the results

Benchmarking (1, 2)

  • Metrics are exposed by the service orchestrator
  • By default, you can get RPS
    • Documentation claims additional metrics, but more info link is to self
    • Notebook doesn’t show them, either
    • Custom metrics are created by including a metrics method in the custom Python (or whatever language) wrapper
    • They are then accessed either via HTTP (on port 6000 by default) or directly in Kubernetes via Prometheus.
      • Behind the scenes, Prometheus is still using HTTP; it just scrapes the ports and provides a dashboard

Native Kafka integration

  • All you need to do is choose serverType: Kafka (plus input and output topics) in deployment config
  • Seldon will poll the input topic and publish to the output topic

TODO: Python wrapper (1, 2)

What’s the business model?

They sell a managed version of Seldon Core called Seldon Deploy