
Project Overview
A self-directed project using open-source coffee quality data to apply core machine learning techniques. The analysis focused on relationships between sensory measures, using geospatial analysis to explore regional trends, regression to predict one score from others, clustering to identify sensory profiles, and time series analysis to examine seasonal patterns and production stability.
Tools Used
- Excel – Data Preparation | Visualisation | Analysis
- KeyNote – Presentation
- Python (Jupyter | Anaconda) – Scripting Environment
- pandas | numpy | os – Data Manipulation
- matplotlib | seaborn | pylab – Plotting | Visualisation
- scikit-learn | statsmodels – Machine Learning | Statistical Modelling
- folium – Geospatial Visualisation
- Tableau – Dashboard Design
Skills Demonstrated
- Script Writing
- Exploratory Data Analysis | Data Wrangling | Aggregation | Subsetting
- Linear Regression | Clustering (K-Means) | Model Evaluation
- Time Series Analysis | Stationarity Testing | Lag Analysis
- Geospatial Mapping
- Visualisation | Dashboard Design
Data Sourced
This analysis uses a modified version of data originally sourced from the Coffee Quality Institute, made available on Kaggle.
– Coffee Quality Dataset – Bean origin, variety, altitude, processing method, physical attributes, flavour metrics, and total quality score, with geospatial coordinates for most entries.
The Dataset was accessed on 02 November 2024.
Key Insights
Insight.
Visual — Description
Abc.
Visual — Description
Abc.
Insight.
Visual — Description
Abc.
Visual — Description
Abc.
Insight.
Visual — Description
Abc.
Visual — Description
Abc.
Visual — Description
Abc.
Key Takeaways
Recommendations
- Genre Strategy
Abc.
- Regional Focus
Abc.
- Competitive Positioning
Abc.
- Digital Strategy
Abc.
Links & Deliverables
Tableau Dashboard — *(link to be added)*
