Introduction
As organizations modernize their analytics platforms, migrating legacy on-premises data warehouses to the cloud has become a common initiative. While cloud services offer scalability and flexibility, the migration itself often presents challenges that are not immediately obvious—especially when dealing with large datasets, limited network bandwidth, and continuously changing source systems.
From an AWS architect’s perspective, successful migrations are less about moving data as fast as possible and more about choosing an architecture that respects real-world constraints. In this post, I’ll walk through a practical approach for migrating a 40 TB Oracle data warehouse to Amazon Redshift, focusing on reliability, cost efficiency, and minimal operational impact.
The Constraints That Shape the Architecture
Before choosing any tools, it’s important to understand the constraints that actually drive the design:
- The source system is an on-premises Oracle data warehouse
- Total data volume is approximately 40 TB
- The database receives small daily updates
- Larger batch updates occur monthly
- Available internet bandwidth is limited to 50 Mbps
- The migration must complete within about 30 days
Under these conditions, a standard online migration approach simply doesn’t work.
Why Network-Only Migration Is Not Practical
At 50 Mbps, transferring tens of terabytes over the internet would take several months, even under ideal conditions. More importantly, sustained high network usage can negatively impact production workloads.
From an architectural standpoint, this immediately rules out:
- VPN-only migrations
- Long-running replication jobs for initial loads
- Direct internet-based bulk exports
A different strategy is required for the initial data movement.
Architectural Pattern: Offline Bulk Load + Online Change Replication
A pattern I often recommend in these scenarios is:
- Offline bulk data transfer for the historical dataset
- Online replication for ongoing changes until cutover
On AWS, this pattern maps cleanly to:
- AWS Snowball Edge for bulk transfer
- AWS Schema Conversion Tool (SCT) for schema and data preparation
- AWS Database Migration Service (DMS) for continuous updates
Step 1: Moving the Bulk Data with AWS Snowball Edge
For large datasets, AWS Snowball Edge is the most efficient way to move data when bandwidth is constrained.
The approach is straightforward:
- Request a Snowball Edge device
- Install the AWS SCT extraction agent on a separate on-premises server
- Extract data from the Oracle warehouse directly onto the Snowball device
This avoids saturating the network and allows the bulk of the data to be transferred in parallel with normal business operations.
Once the device is shipped back to AWS, the data is automatically ingested into an Amazon S3 bucket, which acts as the staging layer.
Step 2: Schema Conversion for Amazon Redshift
Oracle and Amazon Redshift use different database engines, so schema conversion is a necessary step.
Using AWS SCT, an architect can:
- Convert Oracle schemas to Redshift-compatible schemas
- Identify unsupported objects early
- Apply Redshift-specific optimizations such as distribution keys and sort keys
Doing this upfront helps avoid performance issues later and ensures the target environment is designed for analytics workloads.
Step 3: Loading Data into Amazon Redshift
With the data staged in S3 and schemas deployed, AWS SCT can be used to load data into Amazon Redshift.
This process leverages Redshift’s native COPY operations, which are optimized for high-throughput ingestion from Amazon S3. Even very large datasets can be loaded efficiently using this method.
At this point, the majority of the data is already available in Redshift.
Step 4: Handling Daily Updates with AWS DMS
Since the source database continues to receive updates, the migration cannot stop at the initial load.
To handle this, AWS DMS is configured to:
- Capture ongoing changes from the Oracle database
- Replicate only incremental updates to Amazon Redshift
- Operate efficiently over limited bandwidth
Because only daily deltas are transferred, the 50 Mbps network limitation is no longer a concern.
Step 5: Validation and Cutover
Before final cutover, I typically recommend:
- Validating row counts and key aggregates
- Running sample analytical queries in Redshift
- Ensuring all change replication is caught up
Once validated, the environment can be cut over ahead of the next major monthly update, minimizing risk and downtime.
Conclusion
Large-scale data warehouse migrations are rarely constrained by technology alone—they are shaped by bandwidth limitations, operational risk, and business timelines. From an AWS architect’s point of view, the most effective solutions are those that balance these factors rather than trying to eliminate them.
By combining offline bulk data transfer with online change replication, this migration approach provides a reliable and cost-effective path to Amazon Redshift. AWS services such as Snowball Edge, Schema Conversion Tool, and Database Migration Service work together to simplify complexity while maintaining control over performance and impact.
For architects designing similar migrations, the key takeaway is clear: choose an architecture that fits your constraints, and the migration becomes a predictable engineering exercise instead of a high-risk project.