Case Studies
Real engagements. Anonymized clients. Outcomes that moved the needle.
300TB Oracle DW Migration to Snowflake on AWS
6 weeks
to full production cutover
94%
reduction in pipeline failures
60%
reduction in infrastructure cost
0
data loss incidents
The Challenge
A regional financial institution had operated a 15-year-old Oracle data warehouse with over 300TB of data, supported by hundreds of Informatica ETL jobs running nightly batch pipelines. Daily batch failures were common, the team spent 30% of their time on fire-fighting, and cloud migration had stalled after two failed in-house attempts.
Our Approach
- →Performed a full data estate audit — catalogued 340 Informatica mappings, 12 source systems, and 80+ downstream BI reports
- →Designed a phased migration strategy: non-critical domains first, core financial data last to de-risk the cutover
- →Replaced Informatica with dbt + Apache Airflow on AWS MWAA, rewriting critical mappings with modern ELT patterns
- →Ran parallel validation for 8 weeks — row-count checks, statistical profiling, and reconciliation reports before cutover
Real-Time Inventory Signal Across 200 Store Locations
< 5 min
data latency (was 48h)
40%
reduction in analyst prep time
3×
faster stockout detection
200
store locations on one platform
The Challenge
A national retail chain was managing inventory across 200 store locations using nightly Excel exports and manual reconciliation. Analysts had a 48-hour data lag, stockout events were only identified after the fact, and the central BI team was spending 40% of their time wrangling flat files instead of generating insight.
Our Approach
- →Designed a streaming ingestion layer using Kafka to capture point-of-sale and inventory events in real time from all 200 locations
- →Built Spark Streaming jobs on Databricks to cleanse, enrich, and model events into a lakehouse using the Medallion architecture
- →Modelled inventory metrics with dbt on top of the Gold layer — single source of truth for stock levels, sell-through rates, and replenishment signals
- →Delivered Power BI dashboards with sub-5-minute refresh, replacing the 48-hour Excel reports
GenAI Document Processing & Patient Risk Scoring Platform
80%
reduction in manual data prep
3 days → 4h
document processing cycle time
87%
accuracy on risk scoring model
100%
lineage tracked end-to-end
The Challenge
A healthcare provider was processing thousands of clinical documents per month — referral letters, discharge summaries, lab reports — almost entirely manually. Analysts spent 60% of their time on data extraction and normalization before any analysis could begin, and the clinical team had no predictive tooling to identify high-risk patients before deterioration.
Our Approach
- →Built a data quality foundation first — automated ingestion of clinical documents into Azure Data Lake with standardized schemas and lineage tracking
- →Designed a GenAI extraction pipeline using LangChain + Azure OpenAI to extract structured entities (diagnoses, medications, vitals) from unstructured clinical notes
- →Implemented a validation layer using data contracts to ensure extracted entities met clinical schema standards before persisting to the data warehouse
- →Built ML models on Azure ML for patient risk scoring using the cleaned, structured signal — trained on 3 years of historical outcome data
Ready to be the next success story?
Tell us your challenge — we'll tell you how we'd approach it.
Start a Conversation