Scintilla Solutions
Architecture Reference

Architecture Patterns

Detailed reference diagrams for the data architectures and platform migrations we design and deliver — combining Snowflake, Databricks, Kafka, dbt, and more.

🏗️

Lakehouse Architecture

Unified platform — lake economics + warehouse performance

Combines the low-cost scalable storage of a data lake with the ACID transactions, schema enforcement, and query performance of a data warehouse. Open table formats (Iceberg, Delta Lake) sit atop object storage, serving SQL analytics, ML, and BI from a single copy of data.

Apache IcebergDelta LakeDatabricksSnowflakeAWS S3 / ADLS / GCS
LAYER 1 — OBJECT STORAGERaw, unstructured and semi-structured data at cloud scaleAWS S3Azure ADLSGCSLAYER 2 — OPEN TABLE FORMATACID transactions · schema evolution · time travel · partition pruningApache IcebergDelta LakeApache HudiLAYER 3 — QUERY ENGINEDatabricksLakehouse Platform · Apache SparkUnity Catalog · MLflow · Delta SharingSnowflakeCloud Data Platform · Iceberg NativeSnowpark · Cortex AI · Data SharingLAYER 4 — SERVE📊 SQL Analytics🧠 ML / AI📈 BI Dashboards⚡ Streaming🔌 Data SharingLakehouse ✓Open & unified!
🥇

Medallion Architecture

Bronze → Silver → Gold progressive data refinement

A layered data quality pattern where raw data lands in Bronze, is cleaned and conformed in Silver, and aggregated into trusted business metrics in Gold. Each layer is independently queryable, enabling traceability back to raw source data at any point.

Delta LakePySparkdbtUnity CatalogDatabricks
Sources🟫 Bronze⬜ Silver🏆 GoldConsumers📁 Files / CSV🗄️ Databases🔌 APIs Event StreamsRaw IngestionAs-is, no transformsSchema-on-readFull history keptDelta LakeS3 / ADLS / GCSPySpark · Auto LoaderCleansed & ConformedDeduplicationType enforcementBusiness keys joinedDelta Lakedbt ModelsPySpark · Great ExpectationsBusiness-ReadyAggregated metricsDenormalized factsTrusted KPIsDelta Lakedbt Gold ModelsUnity Catalog · Databricks📊 BI / Tableau🧠 ML Models📈 DashboardsPLATFORMS:DatabricksDelta LakedbtUnity CatalogApache SparkQuality ✓100% trusted!
🕸️

Data Mesh

Domain-owned data products with federated governance

Decentralises data ownership to business domains (Marketing, Sales, Product, Finance), each producing and owning their own data products. A federated governance layer enforces shared standards and contracts without centralising control — enabling scale without a monolithic data team.

Domain-Driven DesignKafkaData ContractsdbtOpenLineage
DATA MESH — DECENTRALISED DOMAIN OWNERSHIPMarketing DomainDATA PRODUCTCustomer 360TECH STACKKafkadbtSales DomainDATA PRODUCTRevenue MetricsTECH STACKSnowflakedbtProduct DomainDATA PRODUCTUsage EventsTECH STACKDatabricksKafkaFinance DomainDATA PRODUCTP&L ReportsTECH STACKdbtAirflowFederated Governance + Data ContractsSchema registry · SLAs · Discoverability · Access policies · Interoperability standardsOpenLineageData ContractsCollibra / AtlanApache AtlasSelf-Serve Data PlatformStorage · Pipelines · Catalog · Compute — domain teams consume without platform ticketsKafkaAirflowdbtTerraformKubernetesGoverned ✓Decentralized!

Streaming Lakehouse

Real-time ingestion into the lakehouse — sub-second analytics

Extends the lakehouse with a streaming layer: events flow from producers through Apache Kafka, are processed and enriched by Flink or Spark Streaming, and land directly into Iceberg or Delta Lake tables. The same tables serve both real-time dashboards and offline ML training.

Apache KafkaApache FlinkSpark StreamingApache IcebergTecton
STREAMING LAKEHOUSE — SUB-SECOND ANALYTICSSources📱App Events🌡️IoT Sensors🛒Transactions📡ClickstreamApache KafkaEvent streaming busTopics / PartitionsConsumer GroupsSchema RegistryKafka ConnectConfluent / MSKms latencyStream ProcessingApache FlinkStateful processingSpark StreamingMicro-batch · DStreams• Windowing• Aggregations• Joins & Enrichment• Deduplication• Feature computationLakehouse StorageApache IcebergDelta Lake• ACID writes• Upserts / Merges• Time travel• Schema evolutionTecton / FeastReal-TimeConsumers📊 Live Dashboards🤖 AI / ML Models🚨 Fraud Alerts📱 Live App APIs⚡ End-to-end latency: milliseconds to seconds · Serves both streaming queries AND batch analytics on the same tableApache KafkaApache FlinkSpark StreamingApache IcebergDatabricksTecton / FeastEvents ⚡Real-time flow!
🧩

Composable Data Platform

Best-of-breed modular stack — each layer independently swappable

Replaces monolithic data platforms with a pipeline of best-of-breed, independently swappable modules: Airbyte for ingestion, Snowflake/Databricks for storage, dbt for transformation, Airflow for orchestration, and Hex/Metabase for serving. No vendor lock-in at any layer.

dbtFivetranAirbyteAirflowHexSnowflake
COMPOSABLE DATA PLATFORM — BEST-OF-BREED MODULAR STACKEach layer independently swappable · No vendor lock-in · Teams own their layerINGEST300+ connectors · Change Data Capture · ELT patternAirbyteFivetranKafka ConnectSTOREElastic compute · Pay-per-use · Open table formatsSnowflakeDatabricksBigQueryTRANSFORMGit-based · Modular · Tested transformationsdbt Coredbt CloudSpark SQLORCHESTRATEDAG-based · Retry logic · SLA monitoringApache AirflowPrefectDagsterSERVESelf-service analytics · Embedded · APIsHexMetabaseTableauCross-cutting: Data Catalog (Atlan · DataHub) · Data Contracts · Observability (Monte Carlo · Elementary) · Git / CI-CDModular ✓No vendor lock-in!
🔮

Data Fabric

AI-powered unified data management across hybrid infrastructure

A metadata-driven architecture that uses AI and machine learning to discover, classify, and connect data across cloud, on-premise, and SaaS systems. A central active metadata hub (Atlan, Collibra, DataHub) provides lineage, governance, and policy enforcement across all silos.

OpenLineageCollibraAtlanDataHubApache Atlas
DATA FABRIC — AI-POWERED UNIFIED DATA MANAGEMENTCloud DataWarehouseSnowflake · BigQueryOn-PremiseSystemsOracle · SAP · ERPSaaS AppsSalesforce · HubSpotStreamingPlatformKafka · KinesisData LakeS3 · ADLS · GCSAI / MLPlatformDatabricks · SageMakerActive MetadataIntelligence HubAI-powered discoveryLineage · GovernancePolicy enforcementTOOLS:OpenLineageCollibraAtlanApache AtlasDataHubMonte CarloAI-Powered ✓Unified fabric!

Ready to Build One of These?

Our architects have delivered all of these patterns in production. Let's discuss your use case.

Talk to an Architect →