Version: 3.1.1 (latest)

Configure backup and restore

Data Migrator's backup and restore feature creates a snapshot of application settings and configuration files so that you can quickly restore a Data Migrator instance to an earlier state.

A restored Data Migrator instance recreates all migrations on the same paths as before without attempting to reconcile with earlier completed migrations. The following cases also apply:

Running with "Overwrite": Source data will be transferred again and overwrite the data on target.
Running with "Skip if size match": Source data won't be transferred again, as long as their size on the source and target match.

You can back up and restore the application state for data and metadata migrations using the REST API, Data Migrator command line interface (CLI), or User Interface (UI).

Limitations

caution

You can't restore backups with source filesystems containing properties that are dependent on another filesystem.
For example, an S3 source with its credentials stored on a HDFS target with a JCEKS Keystore will fail to be added during a restore due to the absence of the HDFS target, and the restore will fail. Before performing a backup, adjust the S3 authentication method to an option other than the JCEKS.

Backup

The following details show you what data is covered in a backup and what commands are available to manage your backups.

Here's what a Data Migrator backup includes

The backup archive files are stored in /opt/wandisco/livedata-migrator/db/backups by default. Once a backup file has been unzipped, the relevant files are stored in:

Configs /opt/wandisco/livedata-migrator/db/backups/configs/etc/wandisco/livedata-migrator
Objects /opt/wandisco/livedata-migrator/db/backups/objects

Backed-up function	Object/configuration (see above paths)	Description
Application properties	`configs/etc/wandisco/livedata-migrator/application.properties`	Application configuration. See Configure Data Migrator.
LDAP properties	`configs/etc/wandisco/livedata-migrator/ldap.properties`	LDAP configuration,in use when API Access Control is applied.
Bandwidth settings	`objects/BandwidthPolicy.json`	Settings that limit Data Migrator's use of available network bandwidth. See Manage your bandwidth limit.
Additional configuration properties	`objects/ConfigurationPropertiesWrapper.json`	Requeue and max migration configuration.
Data transfer agents	`objects/DataAgents.json`	Settings that define data transfer agents. Data Migrator attempts to register agents again with the information provided from the backup. See Register an agent.
Email registration	`objects/EmailRegistrations.json`	Email address and type.
Environmental configuration	`configs/etc/wandisco/livedata-migrator/vars.env`	Environmental variables stored in `vars.env`.
Exclusions	`objects/RegexExclusions.json` `objects/FileSizeExclusions.json` `objects/DateExclusions.json`	Settings for file and directory patterns that you want to exclude from migrations. See Configure exclusions.
Logging configuration	`configs/etc/wandisco/livedata-migrator/logback-spring.xml`	Logging variables stored in `logback-spring.xml`.
Migrations	`objects/Migrations.json`	Settings that define data migrations. See Create a migration.
Path mapping	`objects/PathMappings.json`	Settings that create alternate paths for specific target filesystems. See Create path mappings.
Secure keys for filesystem access	Optional configuration files such as `/etc/wandisco/livedata-migrator/application.properties`	Secret configuration entries are masked using the logging property `obfuscate.json.properties`. See Masking secret properties.
Schedule of backups	`objects/ScheduleConfig.json`	Backup schedule configuration.
Smtp configuration	`objects/SmtpConfigurations.json`	SMTP settings
Source configuration	`objects/FileSystemConfigurations.json`	Settings that define the source filesystem. See Configure source filesystem.
Targets	`objects/FileSystemConfigurations.json`	Settings that define target filesystems. See Configure target filesystems.

note

Backed-up configuration files are not restored automatically. The steps for manually restoring these files are listed in Manually restore configuration files.

Here's what a metadata backup includes

A metadata backup file can include the following objects and configurations:

Backed-up function	Object/configuration (path in backup file)	Description
Agent configuration	`objects/AgentConfigs.json`	Configuration for Hive Migrator agents.
Application properties	`configs/etc/wandisco/hivemigrator/application.properties`	Application configuration for Hive Migrator. See Configure Hive Migrator.
LDAP properties	`configs/etc/wandisco/hivemigrator/ldap.properties`	LDAP configuration, in use when API Access control is applied.
Backup schedule	`objects/BackupSchedule.json`	The schedule configuration for metadata backups.
Environmental configuration	`configs/etc/wandisco/hivemigrator/vars.sh`	Application environment variables.
Instance ID	`configs/etc/wandisco/hivemigrator/instanceId`	An identifier for the Hive Migrator instance.
Logging	`configs/etc/wandisco/hivemigrator/log4j2.yaml`	Logging configuration.
Migrations	`objects/Migrations.json`	Hive Migrator migrations.
Replication rules	`objects/ReplicationRules.json`	Hive Migrator DB and table replication patterns.
State information	`objects/Conditions.json`	Application state configuration. For example, this flags if the source agent was auto-discovered.

note

Backed-up configuration files are not restored automatically. The steps for manually restoring these files are listed in Manually restore configuration files.

Here's what a Data Migrator backup doesn't include

Currently, the following objects and configurations aren't included in a backup:

Object/configuration	Description
Certificates	Encryption keys and certificates are not included.
LDAP/Access control settings	LDAP/Access Control settings are backed-up but are not automatically applied after an instance is restored. The feature must be reenabled manually. See Manage user access using LDAP.
License file	Product license files are not included. These must be manually restored.

To manually add files to a backup, see Add extra files to a backup.

Here's what a metadata backup doesn't include

Currently, the following objects and configurations aren't included in a backup:

Object/configuration	Description
LDAP/Access control settings	LDAP/Access Control settings are backed-up but are not automatically applied after an instance is restored. The feature must be reenabled manually. See Manage user access using LDAP.

Backup configuration (data migrations)

The following data migration backup configuration parameters are stored in /etc/wandisco/livedata-migrator/application.properties.

Parameter	Description	Default	Recommendation
`backups.listMaxSize`	The maximum number of backup entries returned by the REST API GET command.	`1000`	Same as default
`backups.namePrefix`	The prefix added to generated backup files.	`lm2backup`	Same as default
`backups.location`	The file path where backup files are stored.	`${install.dir}db/backups` (fresh installation) `${install.dir}db/backupDir` (upgrade from an earlier version)	Same as default
`backups.filePaths[N]`	Provide a path to a file that you want to include in a backup. Change the [N] into an integer. You can add multiple file paths by repeating the entry with incremental numbering. For example, `backups.filePaths[0]`, `backups.filePaths[1]`, `backups.filePaths[2]`.	Commented out by default	Same as default

To apply any changes, restart the Data Migrator service.

Backup configuration (metadata migrations)

The following metadata migration backup configuration parameters are stored in /etc/wandisco/hive/application.properties.

Parameter	Description	Default	Recommendation
`hivemigrator.backup.location`	The file path where Hive Migrator backup files are stored.	`/opt/wandisco/hivemigrator/backups`	Same as default
`hivemigrator.backup.listMaxSize`	The maximum number of backup entries returned by the REST API GET command.	`1000`	Same as default
`backups.namePrefix`	The prefix added to generated Hive Migrator backup files.	`hvmbackup`	Same as default

To apply any changes, restart the Hive Migrator service.

Masking secret properties

Sensitive or secret information stored in backup files is made unreadable using the property obfuscate.json.properties, located in /etc/wandisco/livedata-migrator/application.properties. The default value includes the following list of filesystem-based parameters:

${hdfs.fs.type.masked.properties},${adls2.fs.type.masked.properties},
${s3a.fs.type.masked.properties},${gcs.fs.type.masked.properties}

Secret properties for data transfer agents are stored in the backup file:

agent.secret.properties=clientSecret,clientCertKey

Each parameter lists multiple JSON request property values. These values are masked (substituted for random characters) to anyone who views the file:

Filesystem mask parameters	Description	Default
`${agent.secret.properties}`	Properties for data transfer agents to communicate between the Data Migrator server and the agent.	`clientSecret,clientCertKey`
`hdfs.example.secretKey1`,`hdfs.example.secretKey2`		`${hdfs.fs.type.masked.properties}`
`${adls2.fs.type.masked.properties}`	Azure Data Lake Storage Gen2 filesystem properties to be masked.	`fs.secret.Key,sharedKey`,`fs.oauth2.client.secret`, `oauthClientSecret`
`${s3a.fs.type.masked.properties}`	Amazon S3a filesystem properties to be masked.	`fs.s3a.access.key`,`fs.s3a.secret.key`, `secretKey,accessKey`
`${gcs.fs.type.masked.properties}`	Google Cloud service properties to be masked.	`fs.gs.auth.service.account.private.key.id`, `fs.gs.auth.service.account.private.key` ,`privateKey,privateKeyId`,`jsonKeyFile`, `p12KeyFile`

info

Review your configuration files
Don't assume that the default masking will cover all sensitive properties. Review your configuration files, adding additional masked.properties parameters as required.

For new installations, backup files are stored in /opt/wandisco/livedata-migrator/db/backups. For upgrades, the previous default location, /opt/wandisco/livedata-migrator/db/backupDir may be used. To change this location, set a different path using the backups.location parameter. See Backup Configuration.
Backup files have the following filename pattern:
```
lm2backup-DateTime-mig(MigrationNumber)
```
For example: lm2backup-20220711135000.8420-mig7.zip.

Add extra files to a backup

To add extra files to a backup, use the following steps:

Open /etc/wandisco/livedata-migrator/application.properties in a text editor.
Add a backups.filePaths[N] parameter for each file, with the file's path. Each parameter name needs to be unique, so ensure that the bracketed [N] is changed to an integer and incremented for each copy of the parameter. For example:
```
backups.filePaths[0]=/file-to-be-backed-up/file1
backups.filePaths[1]=/file-to-be-backed-up/file2
backups.filePaths[2]=/file-to-be-backed-up/file3
```
Save the file.
Restart Data Migrator. See System service commands - Data Migrator.

Inspect the contents of a data backup file

Use the following commands to check the contents of a backup file:

Navigate to the backups directory
cd /opt/wandisco/livedata-migrator/db/backups
ls -l
total 64
-r-------- 1 hdfs hdfs 6751 Jul  5 12:19 lm2backup-20220705121918.8780-mig0.zip
-r-------- 1 hdfs hdfs 6751 Jul  5 12:34 lm2backup-20220705123406.3700-mig3.zip
-r-------- 1 hdfs hdfs 6751 Jul  5 12:34 lm2backup-20220705123418.9220-mig3.zip
-r-------- 1 hdfs hdfs 6753 Jul  7 07:17 lm2backup-20220707071729.9360-mig7.zip
-r-------- 1 hdfs hdfs 5334 Jul 11 12:49 lm2backup-20220711124912.2670-mig9.zip
-r-------- 1 hdfs hdfs 5334 Jul 11 13:13 lm2backup-20220711131301.5990-mig9.zip
-r-------- 1 hdfs hdfs 6753 Jul 11 13:48 lm2backup-20220711134845.7880-mig9.zip
-r-------- 1 hdfs hdfs 6753 Jul 11 13:50 lm2backup-20220711135000.8420-mig9.zip

Select a backup file that you want to inspect and run the following unzip command:

Unzip a selected data backup file

unzip lm2backup-20220711135000.8420-mig9.zip

Resulting output
inflating: objects/BandwidthPolicy.json
inflating: objects/EmailRegistrations.json
inflating: objects/PathMappings.json
inflating: objects/FileSystemConfiguration.json
inflating: objects/Migrations.json
inflating: objects/RegexExclusions.json
inflating: objects/ScheduleConfig.json
inflating: objects/DateExclusions.json
inflating: objects/FileSizeExclusions.json
inflating: configs/etc/wandisco/livedata-migrator/application.properties
inflating: configs/etc/wandisco/livedata-migrator/vars.env
inflating: configs/etc/wandisco/livedata-migrator/logback-spring.xml

danger

Files not present in their default locations aren't backed up. Missing files don't trigger a notification or log error.

Schedule backups

Data Migrator supports an automatic scheduled backup but it's not enabled by default.

A backup operation is completed when a backup schedule is first enabled or is later updated. Then backups are created according to the schedule period parameter.

note

Example:
You enable a backup schedule and set it to 600 (minutes). Data Migrator immediately creates a backup, then creates another backup every 600 minutes. If you change the schedule to 60, Data Migrator immediately creates another backup and then creates backups every 60 minutes.

Inspect the contents of a metadata backup file

Use the following commands to check the contents of a metadata backup file:

Navigate to the backups directory
cd /opt/wandisco/hivemigrator/backups
ls -l
total 308
-r-------- 1 hive hadoop 3468 Aug 17 15:21 hvmbackup-20220817152118.7840-mig0.zip
-r-------- 1 hive hadoop 3468 Aug 17 15:28 hvmbackup-20220817152818.7820-mig0.zip
-r-------- 1 hive hadoop 3468 Aug 17 15:35 hvmbackup-20220817153518.7800-mig0.zip
-r-------- 1 hive hadoop 3468 Aug 17 15:42 hvmbackup-20220817154218.7790-mig0.zip
-r-------- 1 hive hadoop 3468 Aug 17 15:45 hvmbackup-20220817154527.1330-mig0.zip
-r-------- 1 hive hadoop 3468 Aug 17 15:49 hvmbackup-20220817154918.7800-mig0.zip
-r-------- 1 hive hadoop 3468 Aug 17 15:56 hvmbackup-20220817155618.7790-mig0.zip

Select a backup file that you want to inspect and run the following unzip command:

Unzip a selected metadata backup file

unzip hvmbackup-20220817155618.7790-mig0.zip

Resulting output
inflating: objects/BackupSchedule.json
inflating: objects/AgentConfigs.json
inflating: objects/Conditions.json
inflating: configs/etc/wandisco/hivemigrator/application.properties
inflating:  configs/etc/wandisco/hivemigrator/instanceId
inflating:  configs/etc/wandisco/hivemigrator/vars.sh
inflating: configs/etc/wandisco/hivemigrator/log4j2.yaml

danger

Files not present in their default locations aren't backed up. Missing files don't trigger a notification or log error.

Backup commands

The backup and restore feature can be managed through the UI, Rest API, and Data Migrator's CLI. Select where you wish to manage backups:

UI commands
REST API commands
CLI commands

UI commands

Manage backups and restore from backup using the UI. The available commands are described below:

Configure backup and restore operations for data migration and metadata migrations from the Configuration > Backup and restore UI section.

Create a backup schedule (data migrations)

Sign in to the UI.
Under Instances, select the Data Migrator instance name. The default name is LDM - localhost.
Select Backup and restore from the Configuration links on the side menu.
Select the Schedule data backups checkbox to create a scheduled backup.
[Optional] Enter a backup frequency in minutes. The default is 60.
Select Apply schedule. You'll get a notification that the schedule was applied.

Create a backup schedule (metadata migrations)

Sign in to the UI.
Under Instances, select the Data Migrator instance name. The default name is LDM - localhost.
Select Backup and restore from the Configuration links on the side menu.
Select the Schedule metadata backups checkbox to create a scheduled metadata backup.
[Optional] Enter a backup frequency in minutes. The default is 60.
Select Apply schedule. You'll get a notification that the schedule was applied.

Create immediate backups

Select Back up now. This option ignores any schedule settings and creates immediate data and metadata backups. You can verify that the backup files are created by checking the UI Notifications screen or selecting Restore from backup, which contains a complete list, searchable with a date range.

API commands

Data migrations

You can use the REST API to handle backup and restore operations with scripted automation. To make manual API calls, you can also use the web interface of the swagger-based REST API documentation.

The REST API commands use the following endpoint:

http://<ldm-hostname>:18080/backups/

Create a backup

Use the following command to create a backup file and store it in the backups directory:

Create a backup

curl -X POST "http://127.0.0.1:18080/backups"

Resulting output
{
  "createdAt" : 1657547400842,
  "size" : 16753,
  "migrationsCount" : 7,
  "backupName" : "lm2backup-20220711135000.8420-mig7.zip"
}

List backup files

Use the following command to list the backup files that have already been created:

List backups

curl -X GET "http://127.0.0.1:18080/backups"

Create a backup schedule

Use the following command to create a backup schedule:

Enable a schedule to create a backup every 480 minutes

curl -X PUT "http://127.0.0.1:18080/backups/config/schedule/" -H 'Content-Type:application/json' -d '{"enabled": true, "periodMinutes": 480}'

Resulting output
{
  "enabled" : true,
  "periodMinutes" : 480
}

Review existing schedule configuration

Use the following command to verify that a schedule is enabled:

Check backup schedule settings

curl -X GET "http://127.0.0.1:18080/backups/config/schedule/"

Resulting output
{
  "enabled" : true,
  "periodMinutes" : 480
}

Metadata migrations

You can use the REST API to handle backup and restore operations with scripted automation. To make manual API calls, you can also use the web interface of the swagger-based REST API documentation.

The commands for the Hive Migrator REST API use the following endpoint:

http://<ldm-hostname>:6780/docs

Create a metadata backup

Use the following command to create a metadata backup file and store it in the backups directory:

Create a backup

curl -X POST "http://127.0.0.1:6780/backups"

Resulting output
{
  "createdAt": 1660751127133,
  "size": 3468,
  "migrationsCount": 2,
  "backupName": "hvmbackup-20220817154527.1330-mig2.zip"
}

Backup files are stored in /opt/wandisco/hivemigrator/backups. To change this location, set a different path using the hivemigrator.backups.location parameter. See Backup Configuration.
Backup files have the following filename pattern:
```
hvmbackup-DateTime-mig(MigrationNumber)
```
For example: hvmbackup-20220817154527.1330-mig2.zip.

List metadata backup files

Use the following command to list the metadata backup files that have already been created.

List backups

curl -X GET "http://127.0.0.1:6780/backups"

Get backup details

Use the following command to view the details of a specified metadata backup:

Get details about a backup
curl -X 'GET' \
  'http://127.0.0.1:6780/backups/hvmbackup-20220817155618.7790-mig0.zip' \
  -H 'accept: application/json'

Resulting output
{
  "createdAt": 1660751778779,
  "size": 3468,
  "migrationsCount": 0,
  "backupName": "hvmbackup-20220817155618.7790-mig0.zip"
}

Schedule backups

Data Migrator supports an automatic scheduled metadata backup but it's not enabled by default.

A backup operation is completed when a metadata backup schedule is first enabled or is later updated. Then backups are created according to the schedule period parameter.

Create a backup schedule

Use the following command to create a backup schedule:

Enable a schedule to create a backup every 14 minutes

curl -X PUT "http://127.0.0.1:6780/backups/schedule" -H 'Content-Type:application/json' -d '{"enabled": true, "periodMinutes": 14}'

Resulting output
{
  "enabled": true,
  "periodMinutes": 14
}

Review existing schedule configuration

Use the following command to verify that a schedule is enabled:

Check backup schedule settings

curl -X GET "http://127.0.0.1:6780/backups/schedule/"

Resulting output
{
  "enabled" : true,
  "periodMinutes" : 14
}

CLI commands

Available backup and restore commands are listed in the command reference page with the other Data Migrator CLI commands:

Data backup commands

Metadata backup commands

Restore from backup

Use the restore function to return Data Migrator and Hive Migrator to an earlier state, as recorded in a stored backup files. The restore command is often run on a reinstalled instance with no existing state. To restore to an existing Data Migrator instance, use the following steps:

Delete the Data Migrator default database

These steps remove current Data Migrator settings such as migrations, path mappings, and exclusions.

Open a terminal on the Data Migrator instance.
Switch to the root user or use sudo -i.
Navigate to the database directory:
```
cd /opt/wandisco/livedata-migrator/db/
```
Delete the instance's default database directory:
```
rm -r default-db
```
Restart Data Migrator to initialize the empty database. See Data Migrator service commands.

Delete Hive Migrator default database

These steps remove current Hive Migrator settings.

Open a terminal on the Data Migrator instance.
Switch to the root user or use sudo -i.
Navigate to the database directory:
```
cd /opt/wandisco/hivemigrator/
```
Delete the instance's default Hive Migrator database file:
```
rm hivemigrator.db.mv.db
```

Delete the Hive Migrator configuration backups:

rm /etc/wandisco/hivemigrator/agents.yaml.bck
rm /etc/wandisco/hivemigrator/hive-migrator.yaml.bck

Restart the Hive Migrator service. See Hive Migrator service commands.

caution

If running in a Hadoop environment without Kerberos authentication, Data Migrator will attempt to auto-discover and add HDFS and Hive sources upon restart. Manually remove any auto-discovered sources before restoring from backup."

Manually restore configuration files

These steps must be completed using the command line:

Unzip the data or metadata backup file to retrieve the backed-up configuration files. See Inspect the contents of a backup file.
Rename the current configuration files.
Example rename, apply to all backed-up configuration files
```
cd /etc/wandisco/livedata-migrator/
mv application.properties application.properties.replaced
```
Move the backed-up config files into their correct location.
Example move, apply to all backed-up configuration files
```
mv /backup/files/location/application.properties /etc/wandisco/livedata-migrator/application.properties
```
Restart services. See System service commands.
Continue to the restore steps for UI, REST API or CLI:

Restore commands

The following details show how backup files are used to restore Data Migrator and Hive Migrator to an earlier state.

UI commands

Sign in to the UI.
Under Instances, select the Data Migrator instance name. The default name is LDM - localhost.
Select Backup and Restore from the Configuration section of the menu.
Select Restore from backup.
Select the three dot button for the Data or Metadata backup file from which to restore your instance.
Select Restore.
Check the notifications for confirmation that the backup restored successfully.

API commands

Data restore command

Restore Data Migrator from a backup by using the following curl command:

Restore from backup

curl -X POST "http://127.0.0.1:18080/backups/restore/<backup-file-name>.zip"

Resulting output
{
  "createdAt" : 1657024458922,
  "size" : 26751,
  "migrationsCount" : 0,
  "backupName" : "lm2backup-20220705123418.9220-mig0.zip"
}

The following notification will appear in the UI:

Backup successfully restored notification

Metadata restore command

Restore Data Migrator from a backup by using the following curl command:

Restore from backup

curl -X POST "http://127.0.0.1:6780/backups/restore/<backup-file-name>.zip"

Resulting output
{
  "createdAt" : 1673367251093,
  "size" : 4819,
  "migrationsCount" : 1,
  "backupName" : "hvmbackup-20230110161411.0930-mig1.zip"
}

The following notification will appear in the UI:

Backup successfully restored notification

CLI commands

Available backup and restore commands are listed in the command reference page with the other Data Migrator CLI commands:

Data restore command

backup-restore

Metadata restore command

hive-backup-restore

Limitations​

Backup​

Here's what a Data Migrator backup includes​

Here's what a metadata backup includes​

Here's what a Data Migrator backup doesn't include​

Here's what a metadata backup doesn't include​

Backup configuration (data migrations)​

Backup configuration (metadata migrations)​

Masking secret properties​

Add extra files to a backup​

Inspect the contents of a data backup file​

Schedule backups​

Inspect the contents of a metadata backup file​

Backup commands​

UI commands​

Create a backup schedule (data migrations)​

Create a backup schedule (metadata migrations)​

Create immediate backups​

API commands​

Data migrations​

Create a backup​

List backup files​

Create a backup schedule​

Review existing schedule configuration​

Metadata migrations​

Create a metadata backup​

List metadata backup files​

Get backup details​

Schedule backups​

Create a backup schedule​

Review existing schedule configuration​

CLI commands​

Data backup commands​

Metadata backup commands​

Restore from backup​

Delete the Data Migrator default database​

Delete Hive Migrator default database​

Manually restore configuration files​

Restore commands​

UI commands​

API commands​

Data restore command​

Metadata restore command​

CLI commands​

Data restore command​

Metadata restore command​

Limitations

Backup

Here's what a Data Migrator backup includes

Here's what a metadata backup includes

Here's what a Data Migrator backup doesn't include

Here's what a metadata backup doesn't include

Backup configuration (data migrations)

Backup configuration (metadata migrations)

Masking secret properties

Add extra files to a backup

Inspect the contents of a data backup file

Schedule backups

Inspect the contents of a metadata backup file

Backup commands

UI commands

Create a backup schedule (data migrations)

Create a backup schedule (metadata migrations)

Create immediate backups

API commands

Data migrations

Create a backup

List backup files

Create a backup schedule

Review existing schedule configuration

Metadata migrations

Create a metadata backup

List metadata backup files

Get backup details

Schedule backups

Create a backup schedule

Review existing schedule configuration

CLI commands

Data backup commands

Metadata backup commands

Restore from backup

Delete the Data Migrator default database

Delete Hive Migrator default database

Manually restore configuration files

Restore commands

UI commands

API commands

Data restore command

Metadata restore command

CLI commands

Data restore command

Metadata restore command