TL;DR
In secure enterprise cloud environments, traditional dbt deployment models often introduce unnecessary cost, security risk, or operational friction. This article describes an on-demand, containerised dbt execution model, orchestrated via AWS MWAA and backed by ephemeral ECS tasks. By treating dbt as a workload rather than a service, analytics transformations scale efficiently without always-on infrastructure, improving cost control, security posture, data quality observability, and CI/CD integration, while supporting medallion architectures in modern enterprise data lakes.
Table of Contents
Introduction
Over the last few years, data platforms have shifted away from monolithic enterprise data warehouses toward modular, cloud-native architectures built on object storage, distributed compute, and declarative tooling. As these platforms mature, data transformation is increasingly recognised not as a purely operational concern, but as a form of engineering that benefits from the same discipline applied to software development.
Within this landscape, dbt (data build tool) has become a foundational component of modern analytics engineering. While often described as a SQL transformation tool, its broader impact lies in how it reframes data modelling, testing, and documentation as versioned, testable, and observable engineering artefacts.
This article explores an on-demand dbt execution model for secure cloud environments, one designed specifically for enterprise contexts where always-on services, long-lived credentials, and externally hosted SaaS platforms are either infeasible or undesirable.
dbt as an Analytics Engineering Framework
dbt provides a thin but powerful abstraction over data warehouse and data lake execution engines such as Spark, Trino, Athena, Snowflake, and BigQuery. Rather than introducing its own runtime, dbt focuses on enabling:
- Declarative data modelling using SQL
- Explicit dependency management between models
- Automated lineage generation
- Embedded data quality testing
- Self-documenting data assets
This design aligns naturally with how analytics teams already work, while introducing engineering discipline without imposing significant operational overhead.
Crucially, dbt allows transformation logic to move closer to analytics and domain teams, while still enabling platform teams to enforce governance, security, and deployment standards.
The Problem Space: Secure Data Lake Migrations
The architecture described here emerged from the construction of a migration data lake built on a medallion architecture:
- Bronze: raw ingested data
- Silver: standardised, validated, and conformed entities
- Gold: analytics-ready models aligned with business semantics
In practice, this migration was constrained by a set of non-negotiable enterprise requirements. These constraints fundamentally shaped how transformation tooling could be deployed.
Key constraints included:
- Network isolation
All workloads run inside a sealed VPN with no public internet exposure.
- Strict security and compliance controls
Persistent compute resources and unmanaged access paths were strongly discouraged.
- Multiple transformation domains
Independent dbt projects were required to support different subject areas and release cycles.
- Operational efficiency
Idle infrastructure and always-on transformation services were considered wasteful.
Under these conditions, traditional dbt deployment patterns, such as dbt Cloud or permanently running compute hosting dbt, were either infeasible or misaligned with the platform’s security and cost objectives.
Rethinking dbt Execution: Why dbt Works Better as an On-Demand Workload
A key design decision was to treat dbt not as a long-running service, but as an ephemeral workload.
Instead of asking:
“Where should dbt live?”
The question became:
“When should dbt exist?”
This shift reframes dbt execution as something that is instantiated only when required, executes a well-defined scope of transformations, and is then fully torn down. dbt environments become transient by design, created for execution, not permanence.
This conceptual change underpins the architecture that follows.
Architecture Overview
At a high level, the architecture operates as follows:
- AWS MWAA (Managed Airflow) orchestrates transformation workflows.
- Each workflow launches an ECS task (Fargate).
- The task runs a custom dbt container image, stored in Amazon ECR.
- The dbt container loads a dbt project from S3.
- The container executes dbt run, dbt test, or documentation generation.
- Execution outputs are persisted to the data lake.
- The container is terminated immediately after execution.
The dbt container image is built around:
- A cloud-native execution adapter (for example, dbt-spark backed by AWS Glue)
- Organisation-specific dependencies and configuration standards
Execution roles are tightly scoped and ephemeral, credentials are injected using cloud-native secret management mechanisms, and logs are streamed to centralised logging services for auditability and observability. Each run starts from a known, immutable image and exits cleanly, ensuring reproducibility and a minimal attack surface.
Why On-Demand dbt Execution Improves Cost, Security, and Scale
1. Cost Efficiency
Provisioning compute only for the duration of a dbt run ensures that infrastructure costs scale linearly with actual usage. There is no idle transformation environment consuming resources between schedules or deployments.
2. Security and Compliance
Ephemeral execution significantly reduces risk:
- No long-lived credentials
- No persistent access paths
- No configuration drift over time
Each execution operates within tightly controlled IAM boundaries, inside private network segments, and terminates immediately after completion.
3. Operational Simplicity
This model eliminates several operational burdens associated with long-running hosts:
- Patch management
- Dependency drift between environments
- Manual intervention during failure recovery
Failures are isolated to individual executions rather than shared infrastructure.
4. Scalability and Isolation
Multiple dbt projects can execute in parallel without resource contention. Isolation at the container and execution level simplifies troubleshooting, capacity planning, and platform governance.
Data Quality as a First-Class Output
dbt’s testing framework is frequently under-leveraged, with test outcomes treated as ephemeral execution artefacts rather than durable analytical assets. In this architecture, data quality is elevated to a first-class output of the platform.
Rather than relying on transient logs, all dbt test failures are materialised as Iceberg tables in Amazon S3, using an append-only design. Each test emits structured, queryable records enriched with execution metadata, including test name, model and column context, invocation identifiers, execution timestamps, and failure cardinality. This design ensures that every data quality signal is preserved immutably and can be analysed longitudinally.
By standardising test schemas across auditable test macros (e.g. not-null, accepted values, column expressions, and uniqueness constraints), results from heterogeneous models can be consolidated into unified audit views without post-hoc transformations. These audit datasets are then exposed via the AWS Glue Catalog and queried directly from Amazon Athena, enabling seamless downstream consumption.
Historical trends in data quality are visualised using analytics tooling such as AWS QuickSight, allowing teams to identify systemic issues, unstable models, and regressions introduced by schema or logic changes. Crucially, this shifts data quality from a reactive, failure-driven concern to an observable and measurable platform capability, supporting quantitative assessment of reliability and continuous improvement over time.
CI/CD Integration and Continuous Validation
The same containerised dbt environment used in production is also executed within CI/CD pipelines.
A typical workflow includes:
- A pull request triggering a CI pipeline
- Execution of the dbt container with dbt test
- Persistence of results to a non-production schema
- Automated surfacing of regressions prior to merge
This approach ensures environmental parity between CI and production, faster feedback loops for engineers, and a reduced risk of introducing data regressions.
dbt functions not only as a transformation engine, but also as a continuous validation framework.
Implications for Platform and Executive Stakeholders
For platform teams, this execution model provides:
- Clear governance boundaries
- Predictable operational behaviour
- A clean separation of concerns
For executives and data leaders, it delivers:
- Improved trust in analytical outputs
- Transparent, measurable data quality signals
- Lower total cost of ownership
- Faster delivery of analytics capabilities
Most importantly, it demonstrates that modern data platforms can be both agile and controlled, without compromising one for the other.
Closing Thoughts
dbt is often discussed in terms of features and tooling, but its deeper impact lies in how it encourages teams to think differently about data transformation.
By running dbt as an on-demand, containerised workload, analytics engineering can align more closely with modern cloud execution models, meet stringent enterprise security requirements, and scale transformation efforts without scaling operational burden.
This approach does not replace existing dbt deployment patterns. Instead, it extends dbt into environments where traditional models fall short.
As data platforms continue to evolve, architectures like this suggest that the future of analytics engineering is shaped not only by better tools, but by better execution models.
Interested in how on-demand dbt execution and modern analytics engineering patterns can improve security, scalability, and cost efficiency in enterprise cloud environments?
Explore more of our blogs, where we share practical perspectives on analytics engineering, data platforms, cloud architecture, and secure execution models for modern enterprises.

Dom is an accomplished Lead Data Engineer with over 20 years in technology and more than a decade dedicated to large-scale data platforms across telecom, health, and enterprise domains. He specialises in architecting and modernising cloud-native data platforms, building trusted analytics-ready datasets, and leading engineering teams to deliver robust, scalable, and cost-efficient pipelines.



