Moving Data in Azure – AzCopy in Government

Lately I’ve been having a lot of conversations about moving data around in azure government, and this is a pretty common requirement. The question then becomes how do you do it?

The good news is there’s a tool that’s been around forever called AzCopy and it’s open-source. And the good news is this is a very flexible tool that’s been around for a long time and can be used in any azure cloud. But with that flexibility can come confusion, and there isn’t a lot of documentation on how to do this in azure government.

Installing AzCopy

AzCopy is supported on both Linux and Windows.

If you want to install on linux, ubuntu, these are the commands I used:

wget https://aka.ms/downloadazcopy-v10-linux
tar -xvf downloadazcopy-v10-linux
sudo cp ./azcopy_linux_amd64_*/azcopy /usr/bin/

Additionally, I use the Azure CLI, so that can be installed with:

curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash

Authenticating to Azure:

This is where most people get tripped up. The first part is to log into the AzureCLI, as we are going to tell AzCopy to leverage the AzureCLI context for authentication.

az cloud set --name AzureUSGovernment
az login --use-device-code
az account show # Gets your tenant id

The next part of handling the authentication is making sure you’re configured for azure government. To do that run the following commands:

export AZURE_ENVIRONMENT="AzureUSGovernment"
export AZCOPY_AUTO_LOGIN_TYPE=AZCLI

Permissions Required

This is another key part of where everyone gets tripped up. Having Owner Permissions in Azure doesn’t mean that you have the ability to manipulate the data within the storage account. I would recommend your account have any of the following to ensure you can perform the copy:

  • Azure Blob Storage Contributor
  • Azure Blob Storage Owner

You can use SaS tokens or account keys, but I would strongly advise against that because even enabling those on a storage account is not a good security practice.

Performing the copy

Once you’ve done that, the basics of performing a copy are pretty straight forward.

azcopy copy https://<source>.blob.core.usgovcloudapi.net/<container>/<blog_name> https://<destination>.blob.core.usgovcloudapi.net/<container>/<blog_name>

The tool is configurable to support multiple types of copies too though.

Full Storage Account

azcopy copy https://<source>.blob.core.usgovcloudapi.net/ https://<destination>.blob.core.usgovcloudapi.net/ --recursive

Full Container

azcopy copy https://<source>.blob.core.usgovcloudapi.net/<container>/ https://<destination>.blob.core.usgovcloudapi.net/<container>/ --recursive

Preserve Last Modified, access tier, and properties

Another cool feature is that you can actually preserve the metadata attached.

azcopy copy https://<source>.blob.core.usgovcloudapi.net/ https://<destination>.blob.core.usgovcloudapi.net/ --recursive \
    --preserve-last-modified-time \
    --s2s-preserve-access-tier \
    --s2s-preserve-properties \

Sync Operations (One Way)

azcopy sync https://<source>.blob.core.usgovcloudapi.net/ https://<destination>.blob.core.usgovcloudapi.net/ --recursive --delete-destination true

Increasing Concurrency

Some times you need to move a lot of Data, and AzCopy leverages a server-to-server copy within azure, which means that we can control the speed of the copy and benefit from the scale available.

azcopy copy https://<source>.blob.core.usgovcloudapi.net/ https://<destination>.blob.core.usgovcloudapi.net/ --recursive  --cap-mbps 1000 --block-size-mb 100

Getting Job Status

If you want to get the list of copy jobs that you’ve submitted, you can run the following:

azcopy jobs list

If you want to visually monitor the status of the jobs, you can do the following:

watch -n 1 azcopy jobs list

Resume Failed Jobs

This is my favorite feature, should there be a problem copying the data, you can resume a job where it left off by doing the following:

# Get the job id
azcopy job list

azcopy job resume <job id>