Identifying Cancer Through Deep Learning and Explainability Exploration
By Hongyi James Chen
Senior Category (Grades 11-12)
Innovation | Big Data / AI, Biology

This project examines the feasibility of building a deep learning model that can accurately distinguish leukemic cells from normal cells, as well as the viability of analyzing the black box model so as to explain it in human terms. An accurate and explainable model can be deployed in areas previously inaccessible to the limited existing methods of detecting leukemia in its early stages. It is acknowledged that this is a very difficult process, as cells from individuals vary and cancerous cells are highly morphologically similar to normal cells.
There are two major steps to this project. First, we built the cell classification model by supervised training on a convolutional neural network with provided labeled training data. Many parameters had to be adjusted to yield better results when tested on separate testing data. Second, we created a program designed to help explain the neural network, and the model is inserted into the program and tested on multiple examples. We examined the results, which highlighted regions the algorithm deems important in making its decision, and attempted to explain the algorithm as much as possible.

Though far from perfect, the results are intriguing. Our algorithm had an accuracy of 68% on the testing data set. This is clearly suboptimal; however, this is also significantly higher than random results, demonstrating that our algorithm learnt useful processes and that this research direction is worth exploring further. The main difficulty is the aforementioned similarity between leukemic and normal cells; thus, the training process must be complex and fine-tuned to yield good results. After analyzing the results from the explanation program, we determined that our algorithm focuses on the contour of cells in its decision-making process, especially unsmooth or protruded sections. To conclude, our algorithm is far from the level required for practical applications; however, we demonstrated that such algorithms are possible and can be explained.