
I was in sort of a rut in the past few months, I was trying to work on projects that were clearly above my existing skillset. It took some time before I realized to cut my losses and work again on simpler and shorter projects. While admittedly, I didn’t learn a whole lot new during this project, It did serve as a good refresher.
Data Sources Used
Digits database that comes inbuilt into Scikit It contains (1797) images of labeled handwritten digits made by Gael Varoquaux Loaded using the following code.
Methodology
Part One: Libraries Used
Part Two: Setting Up Dataframe
I used the load_digits() function imported from the load_digits library to set up the dataframe. In order to preview the images and their corresponding labels, I used the algorithm provided by Michael Galarnyk. From what I understand, it works like this.
1. create a blank fig. 2. run a loop calling the first 16 entries in digits.data, digits.target as "image" and "label" respectively. 3. create a subplot inside the blank figure of 2 rows and 8 cols. 4. place images in them (8,8) size and monochromatic color.
Part Three: Train Test Split
Part Four: Testing Algorithms
Here, I have streamlined the process of testing an ML algorithm on a clean dataset. It’s a simple four-step process.
Part Five: Accuracy Results
Logistic Regression: 0.9666666666666667 Support Vector Classifier: 0.9844444444444445 Random Forests: 0.8066666666666666 Knearest neighbors: 0.9733333333333334
Apart from Random Forest Classifier, every other model seems to work out. Maybe I’ll update the post if I find a more efficient configuration for it.
Part Six: Heatmaps








Going through the heatmap of Random forest, it’s clear to see it’s struggling with recognizing certain digits. especially, 1, and 8. – Where the model faltered the most.
Like always, The full code in its entirety can be found here.
REVIEW
As stated in the intro, this project was more of a refresher to get me out of a rut. As such, I refrained from implementing algorithms and libraries I was unfamiliar with. I did manage to revise the following.
- Using the four-step process of machine learning.
- Algorithm to plot images from dataframes.
- Utilizing Scikit’s Make_pipeline library for model instancing.
- Using seaborn for heatmaps.
RESOURCES
https://scikit-learn.org/stable/auto_examples/classification/plot_digits_classification.html

About Me!
An aspiring data scientist with a great interest in machine learning and its applications. I post my work here in the hope to improve over time.