Follow

Business Continuity and Disaster Recovery Planning

To ease support and deployment, SD Elements is deployed on a single virtual appliance (Virtual Machine or VM) as a monolithic service. The VM runs nginx (front-end web proxy), apache (application server), postgres (database), and a few other small services (memcache, rabbitmq, etc).

 

Recommended Service Level Objectives (SLO)

SD Elements is a non-mission critical system. As a result, the impact of disruption in service is limited to employee efficiency in product development teams. While service availability is not mission critical, the data stored in SD Elements can affect the quality of the software produced by development team. As a result, data integrity is a more sensitive subject. We recommend the following SLOs:

Recovery Point Objective (RPO): 1 hour to 1 day

Recovery Time Objective (RTO): 1 hour to 1 business day

For larger deployments the lower RPO and RTO are recommended.

 

Backups

The system designed to store all its active data in an easy to backup database. A database backup can be loaded back into the appliance as long as the backup and restore are done using the same version of the software. (Note: In extenuating circumstances, the data can be migrated to a newer version, but never to an older version of the software)

At minimum, we recommend a full virtual appliance backup before any attempt for updating the software and after a successful update which normally happens once a month. This allows you to restore any system level issues back to the most stable version of the virtual appliance and then use the data backup to load the latest stable data onto the appliance.

Full virtual appliance backup

Since SD Elements runs in a virtualized environment, ready-to-use solutions such as automated snapshot and backup provided by VMWare vCenter or other similar solutions provide an easy way of backing up the entire deployment.

For cases where virtualization layer backup is not available, we offer a replacement method using rzbackup (a wrapped around zbackup) to do de-duped differential backups of the entire operation system and its data from within the server. The process is straightforward to set up and we can assist if need be.

Differential Backups of the Database

Our VMs can be configured to do hourly snapshots of their database (recommended to setup and test). Our VM administration tool (sde_admin) has a built in command that does all the work: it will dump the data and create the rotating deltas. (On our hosted SaaS servers we have it set up to run hourly.) These dumps end up in /docs/sde/backup, and can be pulled off our VMs using rysnc or any other backup agent.

Full Backups of the Database

The SD Elements databases are fairly small in size. It’s also reasonable to maintain full backups of the database. Again our administration tool has a wrapper around postgres’s built in functionality to make doing this work straightforward.

Database replication

Using the aforementioned backup methods, it is possible to replicate the database of the application into database of a warm standby duplicate server on an hourly basis. This can reduce the time to recovery significantly as no new setup and configuration will be needed for the backup system.

 

Summary of Suggested Deployment

As we generally don’t treat SD Elements as a mission critical system, it likely isn’t worth the cost, operational overhead, or complexity, or trying to deploy the system in a manner that meets classic high availability requirements (i.e. five 9’s uptime).  Our suggested approach for disaster recovery is to run a “warm” spare server.

Beginning with two identical servers, designate one as the live server, the other as the backup. The database on the live server is replicated to the database on the backup server: this replication can happen using the built in facilities of postgres, or simply by doing periodic database dumps and imports, etc. Users will use the live server. No one will interact with the backup server.

In case the live server goes down the live server can be made active—using a load balancer, changing DNS records, etc. (You would need to mirror some of the file system configurations that are unique to each server: keyczar keys, certs, …)

Was this article helpful?
0 out of 0 found this helpful
Have more questions? Submit a request

Comments