← Projects

Maintenance Log Classification and Fault Forecasting

Jetstar Airways · 2023 – Present

NLP pipeline that classifies aircraft maintenance logs and forecasts recurring fault patterns, reducing mean time to identify faults by 30%.

The problem

Aircraft maintenance and engineering logs were entered manually by technicians using domain-specific shorthand, making automated classification impractical without significant domain expertise. Unclear entries previously required manual expert review, creating bottlenecks in maintenance scheduling. The Operations Lead identified this as a high-value area for data science, with a well-defined problem scope from the outset.

Architecture

Maintenance Log Classification and Fault Forecasting layered architecture diagramLayered architecture showing Input, Pre-processing, NLP and ML Architecture, and Output stages of the maintenance log classification and fault forecasting systemInputPre-processingNLP and ML architectureOutputMaintenance management systemTechnician log entries + fault codesText normalisationLowercase, tokenise, filterStopword removalNLTK + aviation-specificTF-IDF vectorisationBigrams, 10k featuresBack-translationTextBlob, tech-speak to EnglishFault code repositoryStructured feature extractionFeature engineeringSME-informed, domain-specific featuresNLP classifierSemi-supervised, log categorisationLightGBM forecasterFault pattern predictionClassified logs + forecastsCategory labels + maintenance windowsMaintenance management systemClassified logs + schedulingParts procurement + spare aircraftPower BI dashboardCritical tail reportingPredicted maintenance visibility

Approach

Designed and delivered an NLP pipeline for automated classification of aircraft maintenance logs, enriched with advanced feature engineering. Structured features were extracted from unstructured log text and connected to the fault code repository. Pre-processing involved text normalisation, stopword removal including aviation-specific terms using NLTK, and TF-IDF vectorisation with bigram features across a maximum of 10,000 features. Data augmentation was performed using TextBlob back-translation to convert technician shorthand into plain English, introducing linguistic variation and improving robustness across underrepresented fault categories such as runway debris and micro fractures. The classification system was extended to forecast recurring fault patterns using LightGBM, enabling proactive maintenance scheduling with parts in stock and operational spare aircraft available before faults occurred. Delivered in collaboration with SkyWise developers and a Data Visualisation Analyst who built the Power BI reporting layer.

Key decisions and trade-offs

NLP approach

A RAG-based architecture was not sanctioned for this project given its vintage. Instead, the pipeline was built on classical NLP enriched with advanced feature engineering, connecting to the fault code repository and applying domain-specific preprocessing informed heavily by SME input from aircraft engineers.

Handling domain-specific language

Technician shorthand presented a significant vocabulary challenge. Back-translation via TextBlob was introduced to normalise tech-speak into standard English, effectively bridging the gap between unstructured log entries and a classifiable text format.

Fault forecasting architecture

LightGBM was selected for the forecasting component given its strong performance on tabular data and interpretability for operational stakeholders. The model was scoped specifically to predict scheduled maintenance windows, directly informing parts procurement and spare aircraft planning.

Challenges

Engineer availability

Aircraft engineers are operationally constrained and difficult to schedule for data science input. To work around this, preliminary analysis was run to group similar fault code complaints, allowing engineers to classify batches of complaints rather than individual entries — significantly reducing the time burden on their end.

Unlabelled and inconsistent data

The dataset was almost entirely unlabelled and contained inconsistent entries with mismatched fault codes due to freeform manual entry. Once a subset of labels was identified, these were used to reverse engineer labelling across the broader dataset, effectively converting an unsupervised learning problem into a semi-supervised one.

Stakeholder buy-in

Early engagement with engineers was difficult. As model performance improved and engineers saw tangible uplift, their willingness to provide higher quality input increased organically, creating a positive feedback loop between model quality and SME engagement.

Outcome

Reduced mean time to identify faults by 30%. Enabled proactive maintenance scheduling with parts in stock and operational spare aircraft available ahead of predicted faults. Replaced manual expert review of ambiguous log entries with an automated classification pipeline.

My role

Took full ownership of the solution design from a well-defined operational brief. Designed the entire NLP and forecasting architecture, oversaw development in collaboration with SkyWise developers, and coordinated the Power BI reporting layer with a Data Visualisation Analyst.

What I would do differently

This project was an important learning milestone. It deepened my understanding of operational processes and how aircraft engineers think and work, which directly shaped how I approach domain-heavy problems. The delivery went smoothly and the experience built my confidence in taking ownership of future projects and identifying where data science can add genuine operational value.

NLPClassical MLForecastingProduction ML
PythonNLTKTF-IDFTextBlobLightGBMSnowflakePower BISkyWise