Migrating a Large Oracle Data Warehouse to Amazon Redshift

Introduction

As organizations modernize their analytics platforms, migrating legacy on-premises data warehouses to the cloud has become a common initiative. While cloud services offer scalability and flexibility, the migration itself often presents challenges that are not immediately obvious—especially when dealing with large datasets, limited network bandwidth, and continuously changing source systems.

From an AWS architect’s perspective, successful migrations are less about moving data as fast as possible and more about choosing an architecture that respects real-world constraints. In this post, I’ll walk through a practical approach for migrating a 40 TB Oracle data warehouse to Amazon Redshift, focusing on reliability, cost efficiency, and minimal operational impact.

The Constraints That Shape the Architecture

Before choosing any tools, it’s important to understand the constraints that actually drive the design:

The source system is an on-premises Oracle data warehouse
Total data volume is approximately 40 TB
The database receives small daily updates
Larger batch updates occur monthly
Available internet bandwidth is limited to 50 Mbps
The migration must complete within about 30 days

Under these conditions, a standard online migration approach simply doesn’t work.

Why Network-Only Migration Is Not Practical

At 50 Mbps, transferring tens of terabytes over the internet would take several months, even under ideal conditions. More importantly, sustained high network usage can negatively impact production workloads.

From an architectural standpoint, this immediately rules out:

VPN-only migrations
Long-running replication jobs for initial loads
Direct internet-based bulk exports

A different strategy is required for the initial data movement.

Architectural Pattern: Offline Bulk Load + Online Change Replication

A pattern I often recommend in these scenarios is:

Offline bulk data transfer for the historical dataset
Online replication for ongoing changes until cutover

On AWS, this pattern maps cleanly to:

AWS Snowball Edge for bulk transfer
AWS Schema Conversion Tool (SCT) for schema and data preparation
AWS Database Migration Service (DMS) for continuous updates

Step 1: Moving the Bulk Data with AWS Snowball Edge

For large datasets, AWS Snowball Edge is the most efficient way to move data when bandwidth is constrained.

The approach is straightforward:

Request a Snowball Edge device
Install the AWS SCT extraction agent on a separate on-premises server
Extract data from the Oracle warehouse directly onto the Snowball device

This avoids saturating the network and allows the bulk of the data to be transferred in parallel with normal business operations.

Once the device is shipped back to AWS, the data is automatically ingested into an Amazon S3 bucket, which acts as the staging layer.

Step 2: Schema Conversion for Amazon Redshift

Oracle and Amazon Redshift use different database engines, so schema conversion is a necessary step.

Using AWS SCT, an architect can:

Convert Oracle schemas to Redshift-compatible schemas
Identify unsupported objects early
Apply Redshift-specific optimizations such as distribution keys and sort keys

Doing this upfront helps avoid performance issues later and ensures the target environment is designed for analytics workloads.

Step 3: Loading Data into Amazon Redshift

With the data staged in S3 and schemas deployed, AWS SCT can be used to load data into Amazon Redshift.

This process leverages Redshift’s native COPY operations, which are optimized for high-throughput ingestion from Amazon S3. Even very large datasets can be loaded efficiently using this method.

At this point, the majority of the data is already available in Redshift.

Step 4: Handling Daily Updates with AWS DMS

Since the source database continues to receive updates, the migration cannot stop at the initial load.

To handle this, AWS DMS is configured to:

Capture ongoing changes from the Oracle database
Replicate only incremental updates to Amazon Redshift
Operate efficiently over limited bandwidth

Because only daily deltas are transferred, the 50 Mbps network limitation is no longer a concern.

Step 5: Validation and Cutover

Before final cutover, I typically recommend:

Validating row counts and key aggregates
Running sample analytical queries in Redshift
Ensuring all change replication is caught up

Once validated, the environment can be cut over ahead of the next major monthly update, minimizing risk and downtime.

Conclusion

Large-scale data warehouse migrations are rarely constrained by technology alone—they are shaped by bandwidth limitations, operational risk, and business timelines. From an AWS architect’s point of view, the most effective solutions are those that balance these factors rather than trying to eliminate them.

By combining offline bulk data transfer with online change replication, this migration approach provides a reliable and cost-effective path to Amazon Redshift. AWS services such as Snowball Edge, Schema Conversion Tool, and Database Migration Service work together to simplify complexity while maintaining control over performance and impact.

For architects designing similar migrations, the key takeaway is clear: choose an architecture that fits your constraints, and the migration becomes a predictable engineering exercise instead of a high-risk project.