Menu

Cutting data movement costs

SQL Server 2025 Polybase and Amazon FSxN S3 access points

Close‑up of a hand inserting a coin into a black piggy bank with various coins spread around.
Contents

Share this page

Carine Ngwekwe headshot
Carine Ngwekwe

Modern organizations increasingly recognize that data movement—whether through ETL/ELT pipelines, ingestion workflows, replication, or system-to-system integration has become one of the most significant hidden cost drivers within their data platforms. As enterprises scale, they adopt more tools, onboard more data sources, and support more real-time workloads, all of which dramatically increase the amount of data being shuffled across environments.

To combat this, enterprises are shifting away from “copy-first” architectures and embracing data virtualization, in which compute moves to data rather than the other way around. This approach becomes especially powerful when combined with SQL Server PolyBase and Amazon FSx for NetApp ONTAP (FSxN) S3 Access Points, enabling seamless querying of external data without the traditional ETL overhead.

With PolyBase, SQL Server can connect to systems such as Oracle, Teradata, MongoDB, Hadoop, Azure Blob Storage, Azure Data Lake Storage, S3-compatible storage, Cosmos DB, and more, all using standard TSQL. Instead of extracting the data into SQL Server, PolyBase creates external tables that reference the remote data, enabling SQL Server to process queries as if the data were local.

Microsoft explicitly highlights that one of the core benefits of PolyBase is the ability to keep data in its original location and format while minimizing the need for ETL processes traditionally used for data movement. PolyBase pushes computations down to the external source when possible and retrieves only the necessary results. This reduces network load, lowers storage duplication, and removes the need for multiple, overlapping pipelines. In distributed and hybrid data environments, PolyBase acts as a unified query layer, improving agility and enabling real-time access to diverse data systems without the overhead of managing complex data flows.

As SQL Server continues to enhance PolyBase, such as adding native support for Parquet, Delta, Azure storage sources, and generic ODBC connectors in SQL Server 2025, the technology is becoming central to modern data architectures focused on reducing movement, simplifying integration, and lowering overall data platform costs. This becomes even more impactful when combined with Amazon FSx for NetApp ONTAP (FSxN) S3 Access Points, which expose ONTAP file system data via the S3 API without requiring users to copy or restructure their data.

FSxN S3 Access Points allow applications to interact with file data as if they were stored natively in S3, unlocking compatibility with AWS analytics, AI/ML, and serverless services—without the historical need to duplicate NFS/SMB file datasets into S3 buckets. Because access points provide direct read/write S3 access to FSxN volumes, SQL Server PolyBase can leverage these S3 endpoints to query filebased data in place.

Meanwhile, FSxN enforces S3level and filesystemlevel authorization, ensuring secure access through IAM policies and file system identities. Together, PolyBase and FSxN S3 Access Points form a modern dataaccess architecture that drastically reduces ETL costs, eliminates redundant pipelines, and gives SQL Server the ability to analyze cloudaccessible file data without moving it finally allowing organizations to gain insights while keeping both data gravity and budgets under control.

How PolyBase minimizes data movement

  • Queries external data in place using TSQL, which eliminate the need to ingest or copy data into SQL Server.
  • Uses external tables to virtualize remote datasets, which allow SQL Server to treat external files or object storage as if they were local tables without data duplication.
  • Keeps data in its original location and format, which significantly reduces reliance on ETL/ELT processes traditionally used to move data.
  • Supports S3compatible object storage access, by enabling SQL Server to directly read/write data stored in S3 endpoints (including FSxN S3 Access Points) instead of copying it into relational storage.
  • Pushes down query operations to external storage systems when possible, reducing the amount of data transferred over the network.
  • Enables federated queries across heterogeneous data sources (e.g., Hadoop, Blob Storage, S3, or Oracle), eliminating the need to consolidate datasets into a single warehouse.

How PolyBase integrates with FSxN S3 access points

When combined, SQL Server PolyBase and FSxN S3 Access Points create a powerful architecture that dramatically reduces data movement.

AI and CVO data flow overview

FSxN S3 access point is an endpoint that helps control and simplify how different applications or users can access data. With S3 access points, you can easily discover new insights, innovate faster, and make better data decisions. With S3 access points, large buckets policy can be broken down into separate policies for each application that needs access to the dataset without interfering with workflow.

  • PolyBase can query S3compatible storage directly, meaning SQL Server can now query FSxN file system data exposed over S3.
  • FSxN Access Points eliminate the need to migrate or duplicate file shares into S3 buckets.
  • Organizations gain real-time access to file-based datasets through T-SQL without building complex or costly pipelines.
  • Duallayer access controls (IAM + ONTAP security) ensure strong governance without additional infrastructure.

Conclusion

Data movement has become a growing but often overlooked cost driver in enterprise data platforms. By integrating SQL Server PolyBase with FSxN S3 Access Points, organizations can create a modern, cost-efficient, zeroETL architecture. This reduces operational overhead, accelerates analytics, improves governance, and future-proofs the data environment for AI-driven workloads.

To explore more, read: Cutting Data Movement Costs with SQL Server 2025 PolyBase and Amazon FSxN S3 Access Points in the NetApp Community.

Additional references:

Carine Ngwekwe

Carine Ngwekwe is a solutions engineer with more than 7 years of experience in on-premises and cloud SQL Server database solutions. She joined NetApp in 2021, working on SnapCenter with SQL Server databases across the NetApp product portfolio. Carine focuses on designing and implementing high-availability and disaster recovery solutions with SQL Server databases, integrating them with SnapCenter by using NetApp products. In her free time, she enjoys trying new cuisines and exploring the beautiful playgrounds and parks in the Research Triangle Park area in North Carolina.

View all Posts by Carine Ngwekwe

Next Steps

Drift chat loading