← Projects

ANZ Plus Customer Segmentation

Australia and New Zealand Bank (ANZ) · 2019 – 2023

Built the foundational customer segmentation model for ANZ Plus, identifying 12 behavioural cohorts that drove phased product adoption across 2.4M customers.

The problem

ANZ was launching ANZ Plus, a greenfield neo-banking product. With no prior segmentation and an engineering team that couldn't build all features simultaneously, product and engineering leadership needed a data-driven way to identify which customer cohorts to prioritise — and in what order — to maximise early adoption. There was no existing segmentation framework to build on.

Architecture

ANZ Plus customer segmentation architectureSystem architecture showing data flow from Teradata and GCP BigQuery through feature engineering, clustering model evaluation, and phased ANZ Plus product adoption output.Data sourcesTeradataOn-premisesGCP BigQueryCloud data warehouseData pipelineUnified feature storeGCP — merged, cleaned dataFeature engineeringSpending patternsTransactions, amountsTech & app usagePayID, mobile, checksDemographicsAge, account tenureModellingK-MeansEvaluated, rejectedDBSCANEvaluated, rejectedGaussian mixtureSelected — best separation12 customer cohortsDistinct behavioural segmentsOutputPhased rollout planBeta → Alpha → MVPANZ Plus roadmap2.4M customers across phases

Approach

Built a behavioural clustering model from scratch using customer attributes spanning spending patterns, technology usage, age, mobile app engagement, and transaction behaviours. Sourced and unified data across two environments — on-premises Teradata and GCP — before modelling. Evaluated K-Means, DBSCAN, and Gaussian Mixture Models, selecting the final approach by visually inspecting cluster distributions and assessing separation quality. Identified 12 distinct customer cohorts, then worked iteratively with business teams to refine cohort definitions against real-world feature delivery constraints from the mobile engineering teams. Deployed the final model on GCP.

Key decisions and trade-offs

Clustering methodology selection

Evaluated K-Means, DBSCAN, and Gaussian Mixture Models. Rather than defaulting to K-Means, visualised the cluster distributions for each approach and selected based on actual point separation quality — which revealed the most defensible cohort structure for stakeholder communication.

Feature set design

Deliberately combined behavioural signals (spending patterns, transaction types, PayID usage) with demographic and engagement signals (age, mobile app usage) to produce cohorts that were both statistically distinct and actionable for product prioritisation.

GCP as deployment target

Deployed on GCP as the organisation's preferred cloud platform, which also simplified data access given a portion of the input data already resided there — reducing pipeline complexity.

Challenges

Data unification across environments

Sourcing and unifying training data split across on-premises Teradata and GCP required non-trivial pipeline work before modelling could begin.

Iterative cohort refinement

With no prior segmentation as a reference point, cluster definitions went through multiple iterations — requiring close collaboration with business teams to validate cohorts against the realities of feature delivery timelines and mobile engineering capacity.

Outcome

Identified 12 customer cohorts that structured the entire ANZ Plus phased rollout. Phase 1 (simple transfers) onboarded ~250K customers in beta. Phase 2 (PayID, bank transfers, savings buckets) added ~1M customers. Phase 3 (MVP including home loans) brought the total to 2.4M customers across all phases.

My role

Owned the segmentation modelling end-to-end, supported by a data analyst I was managing. Collaborated with the Data Area Lead on stakeholder communications and contributed directly to presenting model outcomes to senior business stakeholders.

What I would do differently

I would take on more explicit leadership of the workstream rather than deferring upward. I would also ensure alignment with the mobile engineering team happened before modelling began — rather than iterating cohort definitions reactively against feature delivery constraints that were already locked.

SegmentationClusteringNeo-bankingProduct AnalyticsGCP
Pythonscikit-learnK-MeansDBSCANGaussian Mixture ModelsGCPTeradataSQLpandas