In my recent project, I ran into the challenge of migrating 1TB of data from an NFS file system hosted on an EC2 server in one AWS account to an EFS file system residing in a separate AWS account. We considered various approaches, including using a central EC2 server with both NFS and EFS mounted to facilitate a rsync command transfer. While this method seemed straightforward, it presented potential scalability concerns and lacked the security features we desired.
This blog post explores a superior solution: leveraging AWS DataSync. DataSync streamlines data transfers between various storage services, including NFS and EFS, even across accounts.
We will delve into the step-by-step process of using DataSync for this migration, ensuring secure, efficient, and scalable data movement. By the end of this blog, you will be equipped to confidently migrate your data between NFS and EFS (or any other AWS Datasync supported services) using the power of AWS DataSync.
Why Use DataSync for This Migration?
Migrating data, especially across accounts, can be a daunting task. Thankfully, DataSync offers a compelling solution:
- Simplified Setup: Forget complex scripting or custom tools. DataSync provides a user-friendly interface to configure your source (NFS) and destination (EFS) locations.
- Secure Transfers: DataSync utilises secure connections to ensure the integrity and confidentiality of your data during the migration process.
- Scalability: DataSync automatically scales based on your data volume, handling large datasets efficiently.
- Flexibility: Choose between one-time transfers or schedule regular syncs to keep your data in both locations up to date.
Setting the Stage for Migration
Before we delve into the steps, ensure you have:
- An NFS file system in Account A (source)
- An EFS file system created in Account B (destination)
- Private network connection between both AWS accounts either by using VPC Peering or via AWS Transit Gateway.
Architecture
The Migration Journey with DataSync
Deploying the DataSync Agent (Source AWS Account)
It is crucial to understand that AWS DataSync utilises an agent that runs on an EC2 instance to facilitate data transfers. While the DataSync service itself resides in the target account (Account B), the agent serves as a bridge in the source account (Account A).
Retrieving the DataSync AMI and Launching the Agent
The first step involves obtaining the DataSync AMI (Amazon Machine Image) specific to your region. You can achieve this using the following AWS CLI command, replacing <region> with your actual AWS region:
aws ssm get-parameter –name /aws/service/datasync/ami –region <region>
Use this ID to launch an EC2 instance in your source account (Account A) in the same VPC as your NFS file system.
Here, AWS recommends using an M5 instance type, specifically the m5.2xlarge size, for optimal performance when handling large datasets like yours. This instance type offers a good balance of CPU, memory, and network bandwidth.
This EC2 instance will then host the DataSync agent, enabling communication with the DataSync service in the target account (Account B).
Setup AWS DataSync Service (Target AWS Account)
Create AWS DataSync Agent
In the target account, follow the steps outlined in the document linked below to create a VPC Endpoint for the DataSync service: https://docs.aws.amazon.com/datasync/latest/userguide/choose-service-endpoint.html
Within the target account (Source B), navigate to the AWS DataSync service and proceed to create an agent.
Select “Amazon EC2” as the agent type and opt for “VPC endpoints using AWS PrivateLink.”
How to Activate Agent?
- After deploying the Windows instance, launch a web browser on it and log into the AWS Management Console. Then, activate the agent by generating an activation key.
Option 1:
- If you have a VPN (Virtual Private Networks) connection from your local computer to the source VPC, you can access the EC2 private IP.
- Complete the activation of your agent by providing the IP address of the EC2 instance deployed in the source account: https://docs.aws.amazon.com/datasync/latest/userguide/activate-agent.html
Option 2:
- If you cannot access the EC2 private IP in a browser on port 80, deploy a Windows instance in the public subnet of the destination region. From this instance, you can reach the private IP of the DataSync agent deployed in your source Region.
- After the Windows instance is deployed, launch a web browser on the Windows instance and log into the AWS Management Console. From the AWS Console, activate the agent by generating an activation key.
Option 3:
- Use AWS SSM (Systems Manager) to start a session to log in to the source account’s private EC2 instance or use SSH if you already have a bastion/jumpbox in the source account.
- Follow the instructions to obtain the activation key and enter it manually into AWS DataSync.
Configuring Source and Destination Locations
- Within the DataSync service, define your source location. Specify “NFS” as the location type and provide the NFS server’s DNS name or IP address along with the mount path.
- Define the destination location by choosing “Amazon EFS” as the location type and selecting your EFS file system from Account B.
Creating and Running the DataSync Task
- Create a DataSync task, specifying the source and destination locations you configured earlier.
- Configure transfer options like filtering files based on patterns or choosing transfer mode (sync or one-time).
- Initiate the task to begin the data migration process.
Monitoring and Verification (Optional)
- The DataSync console offers task status and progress details.
- Once the transfer is complete, verify data integrity on the destination EFS file system in Account B.
Final Thoughts
In conclusion, AWS DataSync offers a robust and secure solution for migrating data between NFS and EFS file systems, even across AWS accounts. By leveraging a DataSync agent on an EC2 instance and a VPC endpoint for secure communication, you can achieve efficient and scalable data transfers.
This blog post has equipped you with the knowledge to set up a DataSync agent, configure the service, and initiate the migration process. Remember to tailor the security group settings for the EC2 instance and leverage IAM roles to grant appropriate access for a seamless data transfer experience.
With DataSync as your ally, say goodbye to complex scripting and hello to a streamlined migration journey for your valuable data.