Skip to main content
Version: 2.4.3 (latest)

Configure backup and restore

Data Migrator's backup and restore feature creates a snapshot of application settings and configuration files so that you can quickly restore a Data Migrator instance to an earlier state.

A restored Data Migrator instance recreates all migrations on the same paths as before without attempting to reconcile with earlier completed migrations. The following cases also apply:

  • Running with "Overwrite": Source data will be transferred again and overwrite the data on target.
  • Running with "Skip if size match": Source data won't be transferred again, as long as their size on the source and target match.

You can back up and restore the application state for data and metadata migrations using the REST API, Data Migrator command line interface (CLI), or User Interface (UI).

Backup

The following details show you what data is covered in a backup and what commands are available to manage your backups.

Here's what a Data Migrator backup includes

The backup archive files are stored in /opt/wandisco/livedata-migrator/db/backups by default. Once a backup file has been unzipped, the relevant files are stored in:

  • Configs /opt/wandisco/livedata-migrator/db/backups/configs/etc/wandisco/livedata-migrator
  • Objects /opt/wandisco/livedata-migrator/db/backups/objects
Backed-up functionObject/configuration (see above paths)Description
Application propertiesconfigs/etc/wandisco/livedata-migrator/application.propertiesApplication configuration. See Configure Data Migrator.
Bandwidth settingsobjects/BandwidthPolicy.jsonSettings that limit Data Migrator's use of available network bandwidth. See Manage your bandwidth limit.
Additional configuration propertiesobjects/ConfigurationPropertiesWrapper.jsonRequeue and max migration configuration.
Data transfer agentsobjects/DataAgents.jsonSettings that define data transfer agents. Data Migrator attempts to register agents again with the information provided from the backup. See Register an agent.
Email registrationobjects/EmailRegistrations.jsonEmail address and type.
Environmental configurationconfigs/etc/wandisco/livedata-migrator/vars.envEnvironmental variables stored in vars.env.
Exclusionsobjects/RegexExclusions.json
objects/FileSizeExclusions.json
objects/DateExclusions.json
Settings for file and directory patterns that you want to exclude from migrations. See Configure exclusions.
Logging configurationconfigs/etc/wandisco/livedata-migrator/logback-spring.xmlLogging variables stored in logback-spring.xml.
Migrationsobjects/Migrations.jsonSettings that define data migrations. See Create a migration.
Path mappingobjects/PathMappings.jsonSettings that create alternate paths for specific target filesystems. See Create path mappings.
Secure keys for filesystem accessOptional configuration files such as /etc/wandisco/livedata-migrator/application.propertiesSecret configuration entries are masked using the logging property obfuscate.json.properties. See Masking secret properties.
Schedule of backupsobjects/ScheduleConfig.jsonBackup schedule configuration.
Smtp configurationobjects/SmtpConfigurations.jsonSMTP settings
Source configurationobjects/FileSystemConfigurations.jsonSettings that define the source filesystem. See Configure source filesystem.
Targetsobjects/FileSystemConfigurations.jsonSettings that define target filesystems. See Configure target filesystems.
note

Backed-up configuration files are not restored automatically. The steps for manually restoring these files are listed in Manually restore configuration files.

Here's what a metadata backup includes

A metadata backup file can include the following objects and configurations:

Backed-up functionObject/configuration (path in backup file)Description
Agent configurationobjects/AgentConfigs.jsonConfiguration for Hive Migrator agents.
Application propertiesconfigs/etc/wandisco/hivemigrator/application.propertiesApplication configuration for Hive Migrator. See Configure Hive Migrator.
Backup scheduleobjects/BackupSchedule.jsonThe schedule configuration for metadata backups.
Environmental configurationconfigs/etc/wandisco/hivemigrator/vars.shApplication environment variables.
Instance IDconfigs/etc/wandisco/hivemigrator/instanceIdAn identifier for the Hive Migrator instance.
Loggingconfigs/etc/wandisco/hivemigrator/log4j2.yamlLogging configuration.
Migrationsobjects/Migrations.jsonHive Migrator migrations.
Replication rulesobjects/ReplicationRules.jsonHive Migrator DB and table replication patterns.
State informationobjects/Conditions.jsonApplication state configuration. For example, this flags if the source agent was auto-discovered.

Here's what a Data Migrator backup doesn't include

Currently, the following objects and configurations aren't included in a backup:

Object/configurationDescription
CertificatesEncryption keys and certificates are not included.
LDAP/Access control settingsLDAP/Access Control settings are backed-up but are not automatically applied after an instance is restored. The feature must be reenabled manually. See Manage user access using LDAP.
License fileProduct license files are not included. These must be manually restored.

To manually add files to a backup, see Add extra files to a backup.

Backup configuration (data migrations)

The following data migration backup configuration parameters are stored in /etc/wandisco/livedata-migrator/application.properties.

ParameterDescriptionDefaultRecommendation
backups.listMaxSizeThe maximum number of backup entries returned by the REST API GET command.1000Same as default
backups.namePrefixThe prefix added to generated backup files.lm2backupSame as default
backups.locationThe file path where backup files are stored.${install.dir}db/backups (fresh installation) ${install.dir}db/backupDir (upgrade from an earlier version)Same as default
backups.filePaths[N]Provide a path to a file that you want to include in a backup. Change the [N] into an integer. You can add multiple file paths by repeating the entry with incremental numbering.
For example, backups.filePaths[0],
backups.filePaths[1], backups.filePaths[2].
Commented out by defaultSame as default

To apply any changes, restart the Data Migrator service.

Backup configuration (metadata migrations)

The following metadata migration backup configuration parameters are stored in /etc/wandisco/hive/application.properties.

ParameterDescriptionDefaultRecommendation
hivemigrator.backup.locationThe file path where Hive Migrator backup files are stored./opt/wandisco/hivemigrator/backupsSame as default
hivemigrator.backup.listMaxSizeThe maximum number of backup entries returned by the REST API GET command.1000Same as default
backups.namePrefixThe prefix added to generated Hive Migrator backup files.hvmbackupSame as default

To apply any changes, restart the Hive Migrator service.

Masking secret properties

Sensitive or secret information stored in backup files is made unreadable using the property obfuscate.json.properties, located in /etc/wandisco/livedata-migrator/application.properties. The default value includes the following list of filesystem-based parameters:

${hdfs.fs.type.masked.properties},${adls2.fs.type.masked.properties},
${s3a.fs.type.masked.properties},${gcs.fs.type.masked.properties}

Secret properties for data transfer agents are stored in the backup file:

agent.secret.properties=clientSecret,clientCertKey

Each parameter lists multiple JSON request property values. These values are masked (substituted for random characters) to anyone who views the file:

Filesystem mask parametersDescriptionDefault
${agent.secret.properties}Properties for data transfer agents to communicate between the Data Migrator server and the agent.clientSecret,clientCertKey
hdfs.example.secretKey1,hdfs.example.secretKey2${hdfs.fs.type.masked.properties}
${adls2.fs.type.masked.properties}Azure Data Lake Storage Gen2 filesystem properties to be masked.fs.secret.Key,sharedKey,fs.oauth2.client.secret,
oauthClientSecret
${s3a.fs.type.masked.properties}Amazon S3a filesystem properties to be masked.fs.s3a.access.key,fs.s3a.secret.key,
secretKey,accessKey
${gcs.fs.type.masked.properties}Google Cloud service properties to be masked.fs.gs.auth.service.account.private.key.id,
fs.gs.auth.service.account.private.key
,privateKey,privateKeyId,jsonKeyFile,
p12KeyFile
info

Review your configuration files
Don't assume that the default masking will cover all sensitive properties. Review your configuration files, adding additional masked.properties parameters as required.

  • For new installations, backup files are stored in /opt/wandisco/livedata-migrator/db/backups. For upgrades, the previous default location, /opt/wandisco/livedata-migrator/db/backupDir may be used. To change this location, set a different path using the backups.location parameter. See Backup Configuration.
  • Backup files have the following filename pattern:
    lm2backup-DateTime-mig(MigrationNumber)
    For example: lm2backup-20220711135000.8420-mig7.zip.

Add extra files to a backup

To add extra files to a backup, use the following steps:

  1. Open /etc/wandisco/livedata-migrator/application.properties in a text editor.
  2. Add a backups.filePaths[N] parameter for each file, with the file's path. Each parameter name needs to be unique, so ensure that the bracketed [N] is changed to an integer and incremented for each copy of the parameter. For example:
    backups.filePaths[0]=/file-to-be-backed-up/file1
    backups.filePaths[1]=/file-to-be-backed-up/file2
    backups.filePaths[2]=/file-to-be-backed-up/file3
  3. Save the file.
  4. Restart Data Migrator. See System service commands - Data Migrator.

Inspect the contents of a data backup file

Use the following commands to check the contents of a backup file:

Navigate to the backups directory
cd /opt/wandisco/livedata-migrator/db/backups
ls -l
total 64
-r-------- 1 hdfs hdfs 6751 Jul 5 12:19 lm2backup-20220705121918.8780-mig0.zip
-r-------- 1 hdfs hdfs 6751 Jul 5 12:34 lm2backup-20220705123406.3700-mig3.zip
-r-------- 1 hdfs hdfs 6751 Jul 5 12:34 lm2backup-20220705123418.9220-mig3.zip
-r-------- 1 hdfs hdfs 6753 Jul 7 07:17 lm2backup-20220707071729.9360-mig7.zip
-r-------- 1 hdfs hdfs 5334 Jul 11 12:49 lm2backup-20220711124912.2670-mig9.zip
-r-------- 1 hdfs hdfs 5334 Jul 11 13:13 lm2backup-20220711131301.5990-mig9.zip
-r-------- 1 hdfs hdfs 6753 Jul 11 13:48 lm2backup-20220711134845.7880-mig9.zip
-r-------- 1 hdfs hdfs 6753 Jul 11 13:50 lm2backup-20220711135000.8420-mig9.zip

Select a backup file that you want to inspect and run the following unzip command:

Unzip a selected data backup file
unzip lm2backup-20220711135000.8420-mig9.zip
Resulting output
inflating: objects/BandwidthPolicy.json
inflating: objects/EmailRegistrations.json
inflating: objects/PathMappings.json
inflating: objects/FileSystemConfiguration.json
inflating: objects/Migrations.json
inflating: objects/RegexExclusions.json
inflating: objects/ScheduleConfig.json
inflating: objects/DateExclusions.json
inflating: objects/FileSizeExclusions.json
inflating: configs/etc/wandisco/livedata-migrator/application.properties
inflating: configs/etc/wandisco/livedata-migrator/vars.env
inflating: configs/etc/wandisco/livedata-migrator/logback-spring.xml
danger

Files not present in their default locations aren't backed up. Missing files don't trigger a notification or log error.

Schedule backups

Data Migrator supports an automatic scheduled backup but it's not enabled by default.

A backup operation is completed when a backup schedule is first enabled or is later updated. Then backups are created according to the schedule period parameter.

note

Example:
You enable a backup schedule and set it to 600 (minutes). Data Migrator immediately creates a backup, then creates another backup every 600 minutes. If you change the schedule to 60, Data Migrator immediately creates another backup and then creates backups every 60 minutes.

Inspect the contents of a metadata backup file

Use the following commands to check the contents of a metadata backup file:

Navigate to the backups directory
cd /opt/wandisco/hivemigrator/backups
ls -l
total 308
-r-------- 1 hive hadoop 3468 Aug 17 15:21 hvmbackup-20220817152118.7840-mig0.zip
-r-------- 1 hive hadoop 3468 Aug 17 15:28 hvmbackup-20220817152818.7820-mig0.zip
-r-------- 1 hive hadoop 3468 Aug 17 15:35 hvmbackup-20220817153518.7800-mig0.zip
-r-------- 1 hive hadoop 3468 Aug 17 15:42 hvmbackup-20220817154218.7790-mig0.zip
-r-------- 1 hive hadoop 3468 Aug 17 15:45 hvmbackup-20220817154527.1330-mig0.zip
-r-------- 1 hive hadoop 3468 Aug 17 15:49 hvmbackup-20220817154918.7800-mig0.zip
-r-------- 1 hive hadoop 3468 Aug 17 15:56 hvmbackup-20220817155618.7790-mig0.zip

Select a backup file that you want to inspect and run the following unzip command:

Unzip a selected metadata backup file
unzip hvmbackup-20220817155618.7790-mig0.zip
Resulting output
inflating: objects/BackupSchedule.json
inflating: objects/AgentConfigs.json
inflating: objects/Conditions.json
inflating: configs/etc/wandisco/hivemigrator/application.properties
inflating: configs/etc/wandisco/hivemigrator/instanceId
inflating: configs/etc/wandisco/hivemigrator/vars.sh
inflating: configs/etc/wandisco/hivemigrator/log4j2.yaml
danger

Files not present in their default locations aren't backed up. Missing files don't trigger a notification or log error.

Backup commands

The backup and restore feature can be managed through the UI, Rest API, and Data Migrator's CLI. Select where you wish to manage backups:

UI commands

Manage backups and restore from backup using the UI. The available commands are described below:

Configure backup and restore operations for data migration and metadata migrations from the Configuration > Backup and restore UI section.

Create a backup schedule (data migrations)

  1. Sign in to the UI.
  2. Under Instances, select the Data Migrator instance name. The default name is LDM - localhost.
  3. Select Backup and restore from the Configuration links on the side menu.
  4. Select the Schedule data backups checkbox to create a scheduled backup.
  5. [Optional] Enter a backup frequency in minutes. The default is 60.
  6. Select Apply schedule. You'll get a notification that the schedule was applied.

Create a backup schedule (metadata migrations)

  1. Sign in to the UI.
  2. Under Instances, select the Data Migrator instance name. The default name is LDM - localhost.
  3. Select Backup and restore from the Configuration links on the side menu.
  4. Select the Schedule metadata backups checkbox to create a scheduled metadata backup.
  5. [Optional] Enter a backup frequency in minutes. The default is 60.
  6. Select Apply schedule. You'll get a notification that the schedule was applied.

Create immediate backups

Select Back up now. This option ignores any schedule settings and creates immediate data and metadata backups. You can verify that the backup files are created by checking the UI Notifications screen or selecting Restore from backup, which contains a complete list, searchable with a date range.

API commands

Data migrations

You can use the REST API to handle backup and restore operations with scripted automation. To make manual API calls, you can also use the web interface of the swagger-based REST API documentation.

The REST API commands use the following endpoint:

http://<ldm-hostname>:18080/backups/

Create a backup

Use the following command to create a backup file and store it in the backups directory:

Create a backup
curl -X POST "http://127.0.0.1:18080/backups"
Resulting output
{
"createdAt" : 1657547400842,
"size" : 16753,
"migrationsCount" : 7,
"backupName" : "lm2backup-20220711135000.8420-mig7.zip"
}

List backup files

Use the following command to list the backup files that have already been created:

List backups
curl -X GET "http://127.0.0.1:18080/backups"

Create a backup schedule

Use the following command to create a backup schedule:

Enable a schedule to create a backup every 480 minutes
curl -X PUT "http://127.0.0.1:18080/backups/config/schedule/" -H 'Content-Type:application/json' -d '{"enabled": true, "periodMinutes": 480}'
Resulting output
{
"enabled" : true,
"periodMinutes" : 480
}

Review existing schedule configuration

Use the following command to verify that a schedule is enabled:

Check backup schedule settings
curl -X GET "http://127.0.0.1:18080/backups/config/schedule/"
Resulting output
{
"enabled" : true,
"periodMinutes" : 480
}

Metadata migrations

You can use the REST API to handle backup and restore operations with scripted automation. To make manual API calls, you can also use the web interface of the swagger-based REST API documentation.

The commands for the Hive Migrator REST API use the following endpoint:

http://<ldm-hostname>:6780/docs

Create a metadata backup

Use the following command to create a metadata backup file and store it in the backups directory:

Create a backup
curl -X POST "http://127.0.0.1:6780/backups"
Resulting output
{
"createdAt": 1660751127133,
"size": 3468,
"migrationsCount": 2,
"backupName": "hvmbackup-20220817154527.1330-mig2.zip"
}

  • Backup files are stored in /opt/wandisco/hivemigrator/backups. To change this location, set a different path using the hivemigrator.backups.location parameter. See Backup Configuration.
  • Backup files have the following filename pattern:
    hvmbackup-DateTime-mig(MigrationNumber)
    For example: hvmbackup-20220817154527.1330-mig2.zip.

List metadata backup files

Use the following command to list the metadata backup files that have already been created.

List backups
curl -X GET "http://127.0.0.1:6780/backups"

Get backup details

Use the following command to view the details of a specified metadata backup:

Get details about a backup
curl -X 'GET' \
'http://127.0.0.1:6780/backups/hvmbackup-20220817155618.7790-mig0.zip' \
-H 'accept: application/json'
Resulting output
{
"createdAt": 1660751778779,
"size": 3468,
"migrationsCount": 0,
"backupName": "hvmbackup-20220817155618.7790-mig0.zip"
}

Schedule backups

Data Migrator supports an automatic scheduled metadata backup but it's not enabled by default.

A backup operation is completed when a metadata backup schedule is first enabled or is later updated. Then backups are created according to the schedule period parameter.

Create a backup schedule

Use the following command to create a backup schedule:

Enable a schedule to create a backup every 14 minutes
curl -X PUT "http://127.0.0.1:6780/backups/schedule" -H 'Content-Type:application/json' -d '{"enabled": true, "periodMinutes": 14}'
Resulting output
{
"enabled": true,
"periodMinutes": 14
}

Review existing schedule configuration

Use the following command to verify that a schedule is enabled:

Check backup schedule settings
curl -X GET "http://127.0.0.1:6780/backups/schedule/"
Resulting output
{
"enabled" : true,
"periodMinutes" : 14
}

CLI commands

Available backup and restore commands are listed in the command reference page with the other Data Migrator CLI commands:

Data backup commands

Metadata backup commands

Restore from backup

Use the restore function to return Data Migrator and Hive Migrator to an earlier state, as recorded in a stored backup files. The restore command is often run on a reinstalled instance with no existing state. To restore to an existing Data Migrator instance, use the following steps:

Delete the Data Migrator default database

These steps remove current Data Migrator settings such as migrations, path mappings, and exclusions.

  1. Open a terminal on the Data Migrator instance.

  2. Switch to the root user or use sudo -i.

  3. Navigate to the database directory:

    cd /opt/wandisco/livedata-migrator/db/
  4. Delete the instance's default database directory:

    rm -r default-db
  5. Restart Data Migrator to initialize the empty database. See Data Migrator service commands.

Delete Hive Migrator default database

These steps remove current Hive Migrator settings.

  1. Open a terminal on the Data Migrator instance.
  2. Switch to the root user or use sudo -i.
  3. Navigate to the database directory:
    cd /opt/wandisco/hivemigrator/
  4. Delete the instance's default Hive Migrator database file:
    rm hivemigrator.db.mv.db
  5. Delete the Hive Migrator configuration backups:
    rm /etc/wandisco/hivemigrator/agents.yaml.bck
    rm /etc/wandisco/hivemigrator/hive-migrator.yaml.bck
  6. Restart the Hive Migrator service. See Hive Migrator service commands.

Manually restore configuration files

These steps must be completed using the command line:

  1. Unzip the data or metadata backup file to retrieve the backed-up configuration files. See Inspect the contents of a backup file.

    The following configuration files are backed up by default:

    Data backup
    6122   07-11-2022 13:50   configs/etc/wandisco/livedata-migrator/application.properties
    377 07-11-2022 13:50 configs/etc/wandisco/livedata-migrator/vars.env
    11914 07-11-2022 13:50 configs/etc/wandisco/livedata-migrator/logback-spring.xml
    Metadata backup
    1752  08-17-2022 15:56   configs/etc/wandisco/hivemigrator/application.properties
    13 08-17-2022 15:56 configs/etc/wandisco/hivemigrator/instanceId
    697 08-17-2022 15:56 configs/etc/wandisco/hivemigrator/vars.sh
    4239 08-17-2022 15:56 configs/etc/wandisco/hivemigrator/log4j2.yaml
  2. Rename the current configuration files.

    Example rename, apply to all backed-up configuration files
    cd /etc/wandisco/livedata-migrator/
    mv application.properties application.properties.replaced
  3. Move the backed-up config files into their correct location.

    Example move, apply to all backed-up configuration files
    mv /backup/files/location/application.properties /etc/wandisco/livedata-migrator/application.properties
  4. Restart services. See System service commands.

  5. Continue to the restore steps for UI, REST API or CLI:

Restore commands

The following details show how backup files are used to restore Data Migrator and Hive Migrator to an earlier state.

UI commands

  1. Sign in to the UI.
  2. Under Instances, select the Data Migrator instance name. The default name is LDM - localhost.
  3. Select Backup and Restore from the Configuration section of the menu.
  4. Select Restore from backup.
  5. Select the three dot button for the Data or Metadata backup file from which to restore your instance.
  6. Select Restore.
  7. Check the notifications for confirmation that the backup restored successfully.

API commands

Data restore command

Restore Data Migrator from a backup by using the following curl command:

Restore from backup
curl -X POST "http://127.0.0.1:18080/backups/restore/<backup-file-name>.zip"
Resulting output
{
"createdAt" : 1657024458922,
"size" : 26751,
"migrationsCount" : 0,
"backupName" : "lm2backup-20220705123418.9220-mig0.zip"
}

The following notification will appear in the UI:

Backup successfully restored notification

Metadata restore command

Restore Data Migrator from a backup by using the following curl command:

Restore from backup
curl -X POST "http://127.0.0.1:6780/backups/restore/<backup-file-name>.zip"
Resulting output
{
"createdAt" : 1673367251093,
"size" : 4819,
"migrationsCount" : 1,
"backupName" : "hvmbackup-20230110161411.0930-mig1.zip"
}

The following notification will appear in the UI:

Backup successfully restored notification

CLI commands

Available backup and restore commands are listed in the command reference page with the other Data Migrator CLI commands:

Data restore command

Metadata restore command