Using the MLflow Python SDK with Authentication and RBAC

On Alauda AI the MLflow Tracking Server runs behind single sign-on and multi-tenancy: an OAuth proxy authenticates every caller, and the server records each run under the calling user and authorizes it against Kubernetes RBAC. This guide shows how to drive the stock MLflow Python SDK through that OAuth proxy with your own identity, using the OAuth2 password grant to obtain a token from a username and password — no browser, and never the MLflow container port.

Platform setup (administrator, one-time)

The password grant needs two settings, which an administrator enables once:

  • Accept bearer tokens at the proxy. Add --skip-jwt-bearer-tokens=true to the MLflow OAuth proxy so it accepts a Dex OIDC token alongside browser sessions:

    # MLflow plugin values
    auth:
      oauth:
        extraArgs:
          - --skip-jwt-bearer-tokens=true
  • Allow the password grant. Dex must have the password connector enabled (enablePasswordDB: true), and the OAuth client you authenticate with must list password in its grantTypes. Register a dedicated client for this rather than the platform's interactive-login client.

Prerequisites

  • mlflow 3.10 or later (pip install "mlflow>=3.10"). Workspace selection (mlflow.set_workspace) is a 3.10+ feature.
  • A platform username and password — ideally a dedicated service account, not a person's login — that can access the target workspace (see Workspace Access).
  • The Dex client id and secret allowed to use the password grant (from your administrator).

How authentication works

Two layers sit in front of your runs:

  1. The OAuth proxy (oauth2-proxy) authenticates the request. With --skip-jwt-bearer-tokens, it accepts a Dex-issued OIDC id token sent as Authorization: Bearer ….
  2. The MLflow server's kubernetes-auth plugin reads your identity from that token, records it as the run owner, and authorizes it against your Kubernetes permissions in the workspace.

The client always goes through the OAuth proxy — never connect to the MLflow container port directly.

Connect the SDK

1. Mint an id token with the password grant

Exchange the username and password for a Dex id token in a single call (no browser, no cookie):

export ID_TOKEN=$(curl -sk "https://<platform>/dex/token" \
  -d grant_type=password \
  --data-urlencode "username=$MLFLOW_USERNAME" \
  --data-urlencode "password=$MLFLOW_PASSWORD" \
  -d scope="openid email groups" \
  -d client_id="$DEX_CLIENT_ID" --data-urlencode "client_secret=$DEX_CLIENT_SECRET" \
  | jq -r .id_token)

2. Point the SDK at the MLflow route with the token

The SDK reads MLFLOW_TRACKING_TOKEN and sends it as Authorization: Bearer …:

import os
import mlflow

os.environ["MLFLOW_TRACKING_TOKEN"] = os.environ["ID_TOKEN"].strip()  # → Authorization: Bearer
mlflow.set_tracking_uri("http://mlflow-tracking-server.kubeflow:5000")  # in-cluster Service (fronted by the OAuth proxy)
mlflow.set_workspace("team-a")                 # workspace namespace → X-MLFLOW-WORKSPACE
mlflow.set_experiment("my-experiment")

with mlflow.start_run(run_name="sdk-quickstart") as run:
    mlflow.log_param("learning_rate", 2e-4)
    mlflow.log_metric("loss", 0.123)
    print("run:", run.info.run_id)

The run appears under Alauda AI → Tools → MLFlow, owned by the username you authenticated as. (Verified end-to-end on a secured install: the run owner is the token's user identity.)

Use the in-cluster Service URL http://mlflow-tracking-server.kubeflow:5000 when the client runs inside the cluster (pipeline components, Workbench notebooks). From outside the cluster, point at the platform route https://<platform>/clusters/<cluster>/mlflow instead — both reach the same OAuth proxy.

WARNING

The password grant sends the password to the token endpoint, so use a dedicated service account and keep the credentials and client secret in a Kubernetes Secret, never in code. Always .strip() the token (a trailing newline produces Invalid … character(s) in header value: 'Bearer …\n'). id tokens expire (24 h by default), so re-run step 1 to refresh for long-running jobs. If you use the external HTTPS route and the platform certificate is not trusted by your machine, set MLFLOW_TRACKING_INSECURE_TLS=true.

Selecting a workspace

Runs are recorded in the workspace you select; if you select none, the server's default workspace is used. Any of these set it (the SDK turns them into the X-MLFLOW-WORKSPACE header):

  • mlflow.set_workspace("team-a") in code,
  • the MLFLOW_WORKSPACE=team-a environment variable.

You can only use a workspace your account has access to; see Workspace Access.

Registering models

The model registry is workspace-scoped and authorized the same way, so the usual SDK calls work once connected:

mlflow.set_workspace("team-a")
with mlflow.start_run():
    mlflow.sklearn.log_model(sk_model, name="model", registered_model_name="fraud-detector")

Promote the registered version to Staging or Production from the MLflow UI.

Interactive alternative: browser session

If you cannot use the password grant (for example you only have an interactive SSO login), present your browser session instead — this works without the --skip-jwt-bearer-tokens setting. Sign in at Alauda AI → Tools → MLFlow, copy the _oauth2_proxy cookie from the browser developer tools (Application/Storage → Cookies; include any _oauth2_proxy_N chunks, joined with ; ), and attach it to every request with a header provider:

import os, mlflow
from mlflow.tracking.request_header.abstract_request_header_provider import RequestHeaderProvider
from mlflow.tracking.request_header.registry import _request_header_provider_registry

class ProxySessionHeader(RequestHeaderProvider):
    def in_context(self):
        return bool(os.environ.get("MLFLOW_PROXY_COOKIE"))      # export MLFLOW_PROXY_COOKIE='_oauth2_proxy=<value>'
    def request_headers(self):
        return {"Cookie": os.environ["MLFLOW_PROXY_COOKIE"]}

_request_header_provider_registry.register(ProxySessionHeader)
mlflow.set_tracking_uri("https://<platform>/clusters/<cluster>/mlflow")
mlflow.set_workspace("team-a")

The session cookie expires — copy a fresh one when calls start returning a login redirect.

Troubleshooting

SymptomCheck
/dex/token returns unsupported_grant_type / "password grant … not allowed"The Dex client does not permit the password grant. Use a client whose grantTypes include password (see Platform setup).
Call returns HTML or a redirect (302 to the login page)The OAuth proxy rejected the bearer token. Confirm --skip-jwt-bearer-tokens is enabled and the token is a valid Dex id token (aud = the proxy's client). For the cookie alternative, your _oauth2_proxy value is missing or expired.
Invalid … character(s) in header value: 'Bearer …\n'The token has trailing whitespace. Set MLFLOW_TRACKING_TOKEN to the .strip()-ed value.
Failed to query /api/3.0/mlflow/server-infoThe SDK could not reach the server through the proxy — verify the tracking URI is the platform MLflow route and the token is valid.
403 PERMISSION_DENIEDYour account lacks access to the workspace namespace. Request access to the workspace (see Workspace Access); no ServiceAccount is involved.
Run shows the wrong owner or workspaceThe owner is your authenticated identity; the workspace is set_workspace() / MLFLOW_WORKSPACE (else the server default). Check both.