Introduction
We’re helping a lot of our customers migrate their application workloads over to AWS right now. Cloud migrations come in many flavours and the right strategy to adopt depends on many factors, not least of which is just what a customer wants to migrate, and how much of it there is.
For instance, does the customer want to get their fleet of servers out of a data centre as quickly as possible in order to decrease their infrastructure costs? If so then a “like for like” migration is probably the most appropriate, re-hosting those servers on AWS EC2 instances (what’s known as a “lift and shift”).
Perhaps the customer wants to re-host their application and web servers but also take the opportunity to re-platform some of their database servers to AWS RDS to cut costs and administration overhead in what’s known as a “lift and reshape” migration.
Or maybe the customer wants to move to AWS and adopt more cloud native tools and practices (as well as reducing the licensing costs of COTS) by re-factoring their workloads completely.
A number of our customers fall into this third category. Specifically, they have identified reducing the cost of their current database engine technology. Some are either in the process of migrating their SQL Server data and codebase over to PostgreSQL or are seriously thinking about it.
In this article I’ll talk about some of the challenges and pitfalls that await you on this journey as well as a discussion about the AWS tools available to help you along the way.
SQL Server vs PostgreSQL
Microsoft SQL Server is a very popular RDBMS and historically speaking, probably the first choice of database engine when working within the Microsoft technology stack. But its licensing model can be highly restrictive and result in a high cost of ownership if the database or databases are of significant size, or are used by a significant number of clients. It provides a very user-friendly interface and is easy to learn and relatively easy to administer. This has resulted in Microsoft SQL Server having a large installed user base.
PostgreSQL is the world’s most advanced open source database. The PostgreSQL community is very strong and it continues to improve upon existing featuresas well as constantly implementing new features. As per the database engine popularity ranking ( https://db-engines.com/en/ranking ), PostgreSQL continues to rise in popularity and is quickly homing in on the third placed SQL Server as of October 2020. All this from a free database engine!
If comparing the two engines solely on cost, you should definitely go with PostgreSQL which is free. If you have some spare time you can check the pricing for SQL Server here https://www.microsoft.com/en-au/sql-server/sql-server-2019-pricing. (Why oh why do they make it so fiendishly difficult to calculate?)
PostgreSQL is a cross platform database engine and is available for Windows, Mac and *NIX platforms whilst the vast majority of production SQL Server installations only run on Microsoft’s Windows operating system. Whilst both engines are supported by AWS RDS, SQL Server on RDS doesn’t support certain features such as log shipping, database mail and extended stored procedures. In addition to a slightly limited feature set, SQL Server on RDS restricts the underlying volume usage with all data and log files essentially sitting on a D: drive.
In certain environments, some Administrators might find these limitations unacceptable. In contrast, AWS is constantly building more support and adding features to both Postgres on RDS (currently at version 12.3) and Aurora RDS with Postgres compatibility (currently at version 11.8). Also while SQL Server is now available as a Docker image, it’s not really ready for “prime time” and exists more for development and testing purposes rather than for hosting production data.
PostgreSQL is open source and completely free. Also, AWS and other third-party organisations offer support and hosting for PostgreSQL with pricing dependent on the vendor.
Microsoft SQL Server pricing is dependent on the number of server cores, user connections and engine version. If you’ve clicked on the link above for SQL Server 2019 pricing you will have got an appreciation for the question, “how much does SQL Server cost?” not being an easy one to answer! SQL Server Express, a free version can be used for small amounts of data. Similarly, SQL Server Developer is free to use but not for production instances. For production environments, SQL Server can either be licensed per server with a license required for every device that connects to the database server, or it can be licensed per core of the database server, with a minimum 4-core license requirement. Costs can range from US$1,108 for SQL Server Standard Edition and 1 user upto US$27,496 for SQL Server Enterprise with a 4 core server [1]. This is just the tip of the iceberg as various discounts, volume deals and other variations on pricing can be negotiated with Microsoft complicating things even further!
Although both SQL Server and PostgreSQL are ANSI-SQL compliant database engines there are still plenty of differences between their SQL syntax, data types and case sensitivity amongst other things. Therefore it is not a trivial exercise to migrate your schema, data and database code objects. This is where some AWS tools come to the rescue!
[1] – Pricing information represents an Open No Level (NL) estimated retail price
AWS Schema Conversion Tool
AWS provides two separate tools for migrating databases. The AWS Data Migration Service (DMS) is used to migrate database data and the associated storage objects but not the full schema. The AWS Schema Conversion Tool (SCT) is used to migrate the database schema in full (including both storage objects and code objects) but not the data. We’re going to be looking at the AWS SCT in this article.
You can use the AWS SCT to convert your existing database schema from one database engine to another. You can convert a relational OLTP schema, or data warehouse schema. The converted schema is suitable for Amazon RDS MySQL, MariaDB, Oracle, SQL Server, PostgreSQL, an Amazon Aurora DB cluster, or an Amazon Redshift cluster. The converted schema can also be used with a database engine on an Amazon EC2 instance or stored as data on an Amazon S3 bucket.
For our example, we’re going to look at the AdventureWorks sample database which has been restored to SQL Server on an RDS instance running the SQL Server Express DB engine.
The AdventureWorks database is available from https://docs.microsoft.com/en-us/sql/samples/adventureworks-install-configure?view=sql-server-ver15&tabs=ssms.
The AWS SCT provides a GUI to help visualise the source and target database schemas and to guide you through the different stages of the conversion process. If the schema from your source database can’t be converted automatically, AWS SCT provides guidance via reports on how you can create an equivalent schema in the target database engine of your choice. The AWS SCT is available for Microsoft Windows, Apple macOS and both Fedora and Ubuntu Linux.
For information about how to install the AWS SCT, see https://docs.aws.amazon.com/SchemaConversionTool/latest/userguide/CHAP_Installing.html.
For an introduction to the AWS SCT user interface, see https://docs.aws.amazon.com/SchemaConversionTool/latest/userguide/CHAP_UserInterface.html.
For an overview of the conversion process, see https://docs.aws.amazon.com/SchemaConversionTool/latest/userguide/CHAP_GettingStarted.html.
I’ll be using the Apple macOS version of the AWS SCT in the following examples.
The AdventureWorks SCT Project
After installing the AWS SCT on your platform of choice and restoring the AdventureWorks database onto a suitable SQL Server instance go ahead and open the SCT and create a new project as shown in Figure 1.
Enter the specific details of your project into the dialog as shown in Figure 2.
Connect to your source by clicking the “Connect to Microsoft SQL Server” toolbar button. Enter your source details and test the connection as shown in Figure 3.
Assuming the connection test was successful, click “OK”.
The source schema will now be displayed in a tree-view pane on the left hand side of the project screen as shown in Figure 4.
Using a suitable PostgreSQL instance – I’ve set one up in AWS using RDS running PostgreSQL version 12.3 – click on the “Connect to Amazon RDS for PostgreSQL” toolbar button to set up your target. Enter your target details and test the connection as shown in Figure 5.
Assuming the connection test was successful, click “OK”.
The target schema will now be displayed in a tree-view pane on the right hand side of the project screen as shown in Figure 6. There’s not a great deal to see here yet as we haven’t converted anything!
It’s probably a good idea at this point to save your project.
The Conversion Report
Go ahead and create your first conversion report by right clicking your mouse on the “AdventureWorks2017” node in the source tree-view and selecting “Create report” in the context sensitive menu as shown in Figure 7.
The conversion report is made up of three sections; the Executive Summary, Database Objects with Conversion Actions and Detailed Recommendations.
Executive Summary
The Executive Summary is a high level description of the conversion effort, breaking down the percentage coverage of changes that can be made automatically to both database storage objects and database code objects.
Database Objects with Conversion Actions
This section of the conversion report gives a visual representation of the conversion effort – “green is good”, “red is bad”!
Database storage objects and database code objects are represented by two histograms, with each object falling into one of the following four areas:
Detailed Recommendations
The final section of the report gives detailed recommendations of all conversions that cannot be achieved automatically.
For instance, a database storage object recommendation might be:
Issue 7647: PostgreSQL does not support *CLUSTERED* indexes. Recommended action: Perform a manual conversion. Issue code: 7647 | Number of occurrences: 1 | Estimated complexity: Simple Documentation references: https://www.postgresql.org/docs/11/sql-createindex.html Databases.AdventureWorks2017.Schemas.Production.Tables.BillOfMaterials.Indexes.AK_BillOfMaterials_Prod… |
A database code object recommendation might be:
Issue 7706: Unable convert variable declaration of unsupported datatype Recommended action: To store data of this type in PostgreSQL, use a PostgreSQL-compatible type or use a composite type. Issue code: 7706 | Number of occurrences: 1 | Estimated complexity: Complex Databases.AdventureWorks2017.Schemas.HumanResources.Procedures.uspUpdateEmployeeLogin: 402:418 |
The conversion report can be saved as either a PDF or CSV file.
Applying Automatic Conversions
Those database conversions that can be achieved automatically can be applied to the target database either on an individual object basis or in bulk by selecting the appropriate level in the target database schema tree-view pane on the right hand side of the project screen as shown in Figure 8.
Manual Conversions
Database conversions that can’t be automated by the SCT have to be assessed on a case by case basis and addressed manually.
Let’s have a look at issue 7647 in the conversion report.
As the description in the report for issue 7647 says, “PostgreSQL does not support *CLUSTERED* indexes.”
Using the SCT UI, we can drill down on this issue and see that this T-SQL code:
CREATE UNIQUE CLUSTERED INDEX ON [Production].[BillOfMaterials] ([ProductAssemblyID] ASC, [ComponentID] ASC, [StartDate] ASC); |
has been converted into the following PL/pgSQL code:
CREATE UNIQUE INDEX ON adventureworks2017_production.billofmaterials USING BTREE (productassemblyid ASC, componentid ASC, startdate ASC); |
In order to address this conversion issue manually, one of the two following actions would need to be taken:
- Make [AK_BillOfMaterials_ProductAssemblyID_ComponentID_StartDate] a non-clustered index in the source database and re-run the conversion report / PL/pgSQL generation steps
a) This may well be acceptable if there are no performance issues associated with making this change to replace the clustered index - Leave the generated PL/pgSQL btree index as is and set up a job on your server to run the PostgreSQL CLUSTER command on a scheduled basis if having the data physically reordered on the disc based on the index is critical
a) see https://www.postgresql.org/docs/current/sql-cluster.html for more details
Recommendations
While the AWS SCT can help with the conversion of your source database by automating a lot of the effort, a fair amount of manual conversion will still need to be done depending on the complexity of your database and its storage and code objects.
One strategy would be to impose a database code freeze and perform the migration to Postgres in one “big bang” approach. This may not be a luxury you have available if there is constant, on-going development happening in the back-end.
An alternative strategy would be a multi-phase approach:
- start making changes in your SQL Server T-SQL now that will migrate automatically using the AWS SCT
- enforce current T-SQL coding standards to support this effort
- introduce AWS SCT report generation into your DB release pipeline to monitor the evolution of the 4 “action areas”
- obviously the goal here would be increasing “green” and decreasing “red”
- form a Tiger Team to identity the challenges of the medium to hard complexity database code migrations and perform technical spikes in those areas
- the team would ideally comprise of experts in both SQL Server T-SQL and PostgreSQL PL/pgSQL