Migration of an Analytics Data Warehouse into a Fully Platform Managed Solution on AWS
A UK Government Department had an Analytics Data Warehouse (ADW) hosted on legacy Windows and SQL server software on-premise, which was past end of life and had no ongoing support or maintenance. The department made a strategic decision to re-platform services to AWS, which nominated Version 1 to perform the SQL database migration.
Due to the size, complexity, and current software versions; the migration had a number of challenges such as migrating 270 databases with 8TB of data into a fully platform-managed solution with capacity constraints, translating and converting legacy functionality into a modern data platform, adhering to strict security and audit requirements. The challenges were all tackled methodically to successfully deliver the ADW migration.
Why Version 1?
Version 1 is an AWS Premier Consulting Partner, and has a deep and demonstrable knowledge of AWS. Providing managed services, architectural assessment, cost optimisation, and delivery success to customers.
AWS chose Version 1 as a delivery partner for this project due to our proven track record of AWS and project delivery for other Public and Private sector organisations.
Amazon Relational Database Service
The identified solution was Amazon Relational Database Service (RDS). This is a fully platform-managed database hosting solution that aligns with the strategic objectives of the customer. RDS is a web service that makes it easier to deploy, operate, and scale a relational database to AWS cloud and provides cost-efficient, resizable capacity for industry-standard relational databases.
Other than optimising the application, Amazon RDS manages everything else such as high availability, database backups, patching, scaling, OS, and other server maintenance activities for the customer. The previous solution, running on a legacy version SQL Server 2000, required the migration to rearchitect the solution to transform to the RDS requirements. Optimising and fully utilising the benefits of hosting on a public cloud platform.
A detailed technical discovery took place to identify opportunities for rationalisation. 270 databases were reviewed in this process using the Version 1 Database Discovery Service to identify compatibility issues, discontinued features, and usage statistics for right sizing and rationalisation. This process identified 90 databases for retirement, and several features which would need to be transformed to Amazon Cloud Services.
The data from the Version 1 Database Discovery Service was fed into a feasibility study of the AWS RDS solution. The conclusion was to propose Multi-AZ deployment of four RDS instances to meet the capacity requirements and database redundancy for planned maintenance and availability zone failure.
The integration layer, of Extract Transform Load (ETL) and process automation, was migrated to AWS Cloud Services in line with the customer’s strategy. The integration packages were using legacy SQL Server Data Transformation Services (DTS). The new solution required these to be converted into Pentaho Data Integration (PDI). Berlin Job scheduler was used for process automation, replacing SQL Server Agent Jobs.
A solution for rotating passwords was required for SQL authentication accounts. HashiCorp Vault credential integration managed the automated rotating of passwords to satisfy the customer’s security requirements.
Database Migration Approach
The first phase of the migration was to use the Version 1 Database Discovery Service to capture the as-is environment. This provided a full picture of the features that were no longer supported by the target SQL Server 2019 hosted on Amazon RDS. The output was merged into the second phase of rationalisation, to retire databases and features no longer in use or be replaced by cloud-hosted services.
The database migration had two layers; the data later and the integration layer. The data layer migration approach was a Refactor Optimise, which involves minimal changes to the database to reduce the risk and complexity of migration while utilising the benefits of the modern cloud platform. The integration layer was a Rebuild, using the latest available AWS cloud-hosted services.
The data layer migration of Refactor Optimise had two phase. Firstly; migrate the databases hosted on 3 instances of SQL 2000 to the highest compatible version SQL Server 2008 R2, using the Multi-Platform Version 1 Database Migration service. The SQL Server 2008 R2 instance was an AWS-hosted virtual machine, an EC2 instance. In this phase, the databases were upgraded to the instance level, issues were identified, reported, and mitigated. The second phase was to Rehost the databases to the AWS RDS for SQL Server 2019 instances, again using the Version 1 Database Migration Service, via Amazon S3 storage.
The integration layer migration involved the rebuild of the DTS packages and ETL jobs using a DevOps approach.
Building the Environment
Terraform is an infrastructure as code (IaC) technology that lets you define both cloud and on-premise resources in human-readable configuration files which can be versioned, reused, and shared. This provides the ability to have a consistent workflow to provision and manage infrastructure throughout its lifecycle. Terraform provides repeatability of deployment, rebuild, maintenance, and the performing of repetitive tasks. The delivered development and UAT environment were fully composed using IaC, with extensive use of Terraform deployed with strict security requirements, while still adopting modern methods of provisioning infrastructure and services.
Deployments were highly automated through the build, test (including automated security tests), and deployment process for all environment changes. This facilitated the change and security controls required for a highly regulated environment. Terraform drove efficiencies, and implemented security and best practices in the development and UAT environments. It was achieved by the creation of automated on-demand processes which rebuilt the environments enabling rapid and repeatable deployments of baseline objects such as jobs.
The repeatable processes ensured build standards and consistent implementation of best practices for AWS RDS for SQL Server instances. In addition to this, consistent security standards through the automation of building baseline objects such as SAS, Denodo, and Pentaho application service accounts, audits, logins, and permissions.
Managing Secrets and Protecting Sensitive Data
HashiCorp Vault is a secret management system that is specifically designed to dynamically control, generate, and provide access to sensitive values, credentials, and tokens for specific applications or services if required. A modern system needs a solution that can meet its demand to access a multitude of secrets: API keys for external services, database credentials etc without compromising best security practices, and rolling credentials whilst maintaining a detailed audit log of object changes.
Vault was used on this engagement to store SAS, Denodo, and Pentaho service account credentials. The applications connect to the vault to retrieve applicable credentials for data sources seamlessly without exposing the password to the end user.
As a result of this engagement, the following benefits were delivered to the customer:
- Improved Deployment Velocity: The use of DevOps automation promoted the use of standardised templates, common services, and technologies, that facilitated a cost saving, use of best practice and pace of delivery required for Agile delivery mechanisms.
- Security First: The implementation of Data Encryption with Amazon RDS to secure data at rest, and the implementation of security policies using HashiCorp enabled the migration to public cloud to be compliant with government standards.
- High Availability, Redundancy, and Performance: Version 1 are experts with Data, trusted advisors, with a track record of delivering highly available, multi region solutions and performant systems in public cloud.
- ADW Migrated: The deliverables were achieved in 5 months, this included a fully provisioned Development and UAT environment containing 2 ‘db.m5.large’ SQL 2019 RDS instances, with 8 databases, 2 vCPU, and 8GiB memory per instance. The Production and Pre-Production build and migration were prepared for migration via Terraform as per the Department’s requirements for segregation of duties.
News & Insights
News: February 15, 2024
When you’ve been in the technology mix for a while, you bear witness to your fair share of horizon events. Bubbles that have blown and burst, pocket-sized consumer tech that reshaped society and now...
Events: 23 February 2024
AI Tribes – Join Version 1 alongside fellow experts and visionaries at the frontline of AI, ML, and Data Science
Version 1 is a key sponsor and fireside chat speaker at the upcoming AI tribes event in Dublin of Feb 23rd. AI tribes is a Mini-Summit centred around connection + collaboration for the AI, Data + ML communities...
Webinars: February 8, 2024
In this session join Version 1 and our guests Socitm Advisory, Oracle, and Royal Borough of Kensington and Chelsea for this 1-hour session as we discuss and share some of the considerations that UK Local...