#9 HandWritten Digit Recognition


I was in sort of a rut in the past few months, I was trying to work on projects that were clearly above my existing skillset. It took some time before I realized to cut my losses and work again on simpler and shorter projects. While admittedly, I didn’t learn a whole lot new during this project, It did serve as a good refresher.


Data Sources Used

Digits database that comes inbuilt into Scikit It contains (1797) images of labeled handwritten digits made by Gael Varoquaux Loaded using the following code.

Methodology

Part One: Libraries Used

Part Two: Setting Up Dataframe

I used the load_digits() function imported from the load_digits library to set up the dataframe. In order to preview the images and their corresponding labels, I used the algorithm provided by Michael Galarnyk. From what I understand, it works like this.

1. create a blank fig.
2. run a loop calling the first 16 entries in digits.data, digits.target as "image" and "label" respectively.
3. create a subplot inside the blank figure of 2 rows and 8 cols.
4. place images in them (8,8) size and monochromatic color.

Part Three: Train Test Split

Part Four: Testing Algorithms

Here, I have streamlined the process of testing an ML algorithm on a clean dataset. It’s a simple four-step process.

Part Five: Accuracy Results

Logistic Regression: 0.9666666666666667
Support Vector Classifier: 0.9844444444444445
Random Forests: 0.8066666666666666
Knearest neighbors: 0.9733333333333334

Apart from Random Forest Classifier, every other model seems to work out. Maybe I’ll update the post if I find a more efficient configuration for it.

Part Six: Heatmaps

Going through the heatmap of Random forest, it’s clear to see it’s struggling with recognizing certain digits. especially, 1, and 8. – Where the model faltered the most.

Like always, The full code in its entirety can be found here.


REVIEW

As stated in the intro, this project was more of a refresher to get me out of a rut. As such, I refrained from implementing algorithms and libraries I was unfamiliar with. I did manage to revise the following.

  1. Using the four-step process of machine learning.
  2. Algorithm to plot images from dataframes.
  3. Utilizing Scikit’s Make_pipeline library for model instancing.
  4. Using seaborn for heatmaps.

RESOURCES

https://towardsdatascience.com/logistic-regression-using-python-sklearn-numpy-mnist-handwriting-recognition-matplotlib-a6b31e2b166a

https://scikit-learn.org/stable/auto_examples/classification/plot_digits_classification.html

About Me!

An aspiring data scientist with a great interest in machine learning and its applications. I post my work here in the hope to improve over time.