Using Apache Polaris with FlashBlade

This article looks at how to configure Apache Polaris with FlashBlade as a REST catalog for Apache Iceberg to support Spark and Trino using static S3-compatible credentials.

Apache Polaris with FlashBlade

Summary

Apache Polaris running on Everpure FlashBlade delivers a REST-based Iceberg catalog so Spark and Trino can share lakehouse tables using familiar static S3-compatible credentials.

image_pdfimage_print

Apache Polaris is the open source REST catalog for Apache Iceberg that everyone has been watching. Donated to the Apache Software Foundation by Snowflake, it promises what the Iceberg ecosystem has needed for years: a standalone, vendor-neutral catalog that any engine can talk to over a clean REST API.

The value proposition solves a real problem in data infrastructure. One catalog. OAuth2-based access control. Spark, Trino, Flink, and anything else that speaks the Iceberg REST spec can share tables without tight coupling to a specific metastore. It’s the kind of component that makes multi-engine lakehouse architectures actually practical.

Polaris was built to be cloud-native, and one of its most powerful features is credential vending: the ability to issue temporary, scoped S3 credentials to clients via AWS STS. Everpure® FlashBlade® supports STS however it doesn’t implement the AssumeRole API call, instead favoring AssumeRoleWithWebIdentity and AssumeRoleWithSAML. This means credential vending isn’t available for Apache Polaris as of this writing, but the rest of the catalog works well on FlashBlade once you configure static credentials correctly.

This blog walks through how we got Polaris running on FlashBlade with Spark and Trino, what credential vending does (and what you can do without it), and the configuration details that make it all work.

Credential Vending 101

To understand the configuration, it helps to understand what Polaris is doing when it “vends credentials.”

In a cloud-native deployment on AWS, the flow works like this: A client (say, Spark) authenticates to Polaris via OAuth2 and requests access to a table. Polaris calls AWS STS AssumeRole to generate temporary, scoped credentials that only allow access to the specific S3 paths that the table occupies. It hands those credentials back to the client. The client uses them to read and write data directly to S3, with no long-lived keys floating around and no over-privileged access.

This is genuinely excellent security design. The limitation is that Polaris specifically uses the AssumeRole API, which is the simplest STS flow but also the most AWS-centric.

FlashBlade supports STS through federation-based flows: AssumeRoleWithSAML and AssumeRoleWithWebIdentity (OIDC). These allow FlashBlade to integrate with external identity providers like Okta, Microsoft Entra ID, and Active Directory, which is often a fit for enterprise environments where security teams already manage identity centrally. For more details, see the EverpureObject Identity Federation documentation (Pure1® login required). But because Polaris hardcodes the AssumeRole path, it cannot use FlashBlade STS implementation today.

The practical implication is straightforward: On FlashBlade, clients need to use static S3 credentials for data access instead of Polaris-vended temporary credentials. This is the same credential model that most on-prem Iceberg deployments already use with Hive Metastore. The difference is that implementations will get a modern REST catalog on top.

Standard AWS Cloud Flashblade
Figure 1: The standard Polaris credential vending flow uses AWS STS to issue short-lived tokens. On FlashBlade, we bypass this flow and use static credentials while retaining the REST catalog benefits.

What You Get from Polaris on FlashBlade

Even without credential vending, Polaris brings real improvements over Hive Metastore:

  • Multi-engine access over REST: Spark and Trino both connect to Polaris via the Iceberg REST catalog spec. No Hive Metastore, no Thrift, no shared HDFS state. Each engine authenticates independently via OAuth2 and gets a consistent view of the catalog.
  • Namespace and table governance: Polaris manages namespaces, table metadata, and the mapping between logical tables and physical storage locations. This is the core job of any Iceberg catalog, and it works identically regardless of whether credentials are vended.
  • Catalog-level access control: OAuth2 principals, roles, and scopes still function. You can control which principals see which catalogs and namespaces. The missing piece is per-request credential scoping at the storage layer, not catalog-level authorization.

How to Configure Polaris with FlashBlade Object

Getting Polaris to work on FlashBlade comes down to one critical decision: how you tell Polaris that STS is unavailable. There are two configuration flags that sound like they do the same thing. Only one of them works.

This server-level feature flag sounds like exactly what you want. It tells Polaris to skip the credential subscoping step entirely. The problem is what it does internally: It returns a completely empty storage access configuration. No S3 endpoint. No path-style access setting. No region. The S3FileIO that Polaris uses for server-side metadata operations defaults to s3.amazonaws.com, sends your FlashBlade access key to AWS, and fails with “Access Key Id does not exist.”

This flag was designed for single-tenant AWS deployments where the server’s default AWS credentials and endpoint are already correct. It assumes you are on AWS. On FlashBlade, it will not work.

This is the critical configuration for FlashBlade. Unlike the server-level flag, setting “stsUnavailable”: true inside the storageConfigInfo block ensures Polaris still passes the custom endpoint and path-style access settings to the Iceberg client. This is what makes FlashBlade integration work.

This is a catalog-level property set on the storage configuration when you create the catalog. It tells Polaris to skip the AssumeRole call, but it still populates the storage access configuration with your S3 endpoint, path-style access setting, and region. The S3FileIO gets everything it needs to talk to FlashBlade. Credentials come from environment variables on the Polaris pod.

The catalog creation call looks like this:

The region value is required by the AWS SDK but is meaningless for FlashBlade. Set it to us-east-1 and move on.

Because Polaris cannot vend temporary credentials, every component needs static S3 credentials:

ComponentAuth to PolarisAuth to FlashBlade
Polaris ServerN/AAWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY env vars
SparkOAuth2 (credential=client_id:secret)s3.access-key-id / s3.secret-access-key on catalog config
TrinoOAuth2 (oauth2.credential=client_id:secret)s3.aws-access-key / s3.aws-secret-key env vars

This is the tradeoff. You get the catalog benefits of Polaris with the credential model of static keys. For environments that already manage static S3 credentials (which is most on-prem deployments), this is not a regression. It’s the status quo with a better catalog on top.

Apache Spark 3.x

The key differences from a Hive Metastore configuration: catalog-impl points to the REST catalog, warehouse is a Polaris catalog name (not an S3 path), and you need both OAuth2 credentials for the catalog and static S3 credentials for data access.

Trino (479+)

Trino has a version requirement here. The oauth2.scope property was only added in Trino 454. Without it, Trino sends scope=catalog, which Polaris rejects as invalid_scope. If you’re running Trino 435 or earlier, Polaris integration will not work. We recommend Trino 479, which also requires the native S3 file system (fs.native-s3.enabled=true) instead of the legacy hive.s3.* properties that were removed in that version.

Implementation Gotchas

Getting to a working deployment involved 12 distinct gotchas. Here are the ones most likely to bite you:

GotchaContextResolution
Docker image tags require -incubatingDocker Hubapache/polaris:1.3.0 doesn’t exist. Use apache/polaris:1.3.0-incubating.
No CLI binary for admin toolBootstrapInvoke as java -jar /deployments/polaris-admin-tool.jar, not as a polaris command.
Bootstrap is not idempotentKubernetes JobsSecond run throws IllegalArgumentException: already been bootstrapped. Handle explicitly.
SKIP_CREDENTIAL_SUBSCOPING_INDIRECTION broken before 1.3.0Versions 1.1.0-1.2.0TaskFileIOSupplier ignores the flag entirely. Known bug (apache/polaris#379), fixed in 1.3.0.
Wrong flag for FlashBladeServer configEven when the flag works in 1.3.0, it’s wrong for FlashBlade. Use stsUnavailable=true on catalog instead.
Trino scope rejectionTrino < 454oauth2.scope property missing. Trino sends scope=catalog, Polaris rejects as invalid_scope.
Minimum versionAll deployments1.3.0-incubating is the minimum viable version for FlashBlade.

Looking Ahead

The most exciting near-term opportunity is closing the credential vending gap entirely. FlashBlade already supports AssumeRoleWithWebIdentity (OIDC), which means it can issue temporary, scoped credentials when presented with a token from an identity provider like Okta or Entra ID. If Polaris added support for OIDC-based credential vending alongside its existing AssumeRole path, the full credential vending flow could work on FlashBlade, with the added benefit of integrating with enterprise identity infrastructure that security teams already manage.

A few areas where the community could move this forward:

  • OIDC credential vending in Polaris. Adding AssumeRoleWithWebIdentity as an alternative to AssumeRole would unlock credential vending for any S3-compatible storage that supports OIDC federation. This is arguably a better model for enterprise environments, regardless of where the storage lives.
  • Documentation for non-AssumeRole deployments. The distinction between SKIP_CREDENTIAL_SUBSCOPING_INDIRECTION and stsUnavailable is critical for anyone running without direct AssumeRole support. Contributing clear documentation for this deployment model would benefit the broader community.
  • Preserving storage config when skipping credential subscoping. The behavior of SKIP_CREDENTIAL_SUBSCOPING_INDIRECTION returning an empty StorageAccessConfig makes sense for its intended AWS use case. But extending it to preserve endpoint and path-style configuration would make the flag more broadly useful without changing its behavior for existing deployments.

Conclusion

Apache Polaris works on FlashBlade today. The REST API, OAuth2 access control, multi-engine support, and clean metadata separation are real, tangible improvements over Hive Metastore for on-prem Iceberg deployments. Credential vending is not yet available because Polaris uses AssumeRole rather than the OIDC federation that FlashBlade supports, but the static credential model that replaces it is already familiar to most on-prem teams.

The path to full credential vending is clear: FlashBlade already speaks OIDC, and Polaris just needs to learn to as well. We’re looking forward to contributing to that effort and making Polaris a first-class catalog for every Iceberg deployment, cloud and on-prem alike.