GSoC Chronicles: Final Weeks Report(11 & 12)

Wrapping up my GSoC experience

Hi Guys!!

Before I dive into the last two weeks of my GSoC journey, I have to share about my recent trip to the Deep Learning Indaba. It was such an amazing experience! From preparing for the event to attending the sessions, and meeting so many new faces, it was all a rush of excitement. After the event, I took a short break to gather my thoughts and settle back in.

Now, back to GSoC. In those final two weeks, I had some significant milestones. First, I managed to get the Python script to pass all the PyG tests, which felt like a win. I also worked with the Graph Attention Network on our dataset. The highlight? My pull request, which included the dataset, was accepted by PyG while I was in Accra for the Deep Learning Indaba 2023. That was a fantastic moment! 🎉

Let me take you through the details of how I got there.

My Progress

In the concluding weeks of my GSoC project, my primary focus was on two crucial tasks: ensuring the Python script passed all requisite tests and getting my pull request merged.

Out of the 14 critical tests, my pull request initially cleared 12 but faced challenges with two: the pre-commit and the latest pytest tests. A closer examination revealed that issues with sorting imports and code formatting caused the pre-commit test failures. To address this, I meticulously referred to another PyG dataset's script, ensuring my script was aligned with it in detail – down to the punctuation. This attention to detail paid off as the pull request then reflected the successful completion of all tests.

The challenge with the pytest was due to a bug, which, to my relief, was promptly addressed and resolved by the maintainer. Once fixed, this test too marked a successful pass. This was my first time contributing officially, and this experience provided valuable lessons for handling similar challenges in the future. I'm grateful for the learning opportunity. I've showcased these results in the images below.

Additionally, I refined the two repositories established over the summer, enhancing the READMEs to provide clearer insights into the project's accomplishments and outcomes.

In week 11, I ventured into building a Graph Neural Network (GNN) using the GATConv, a part of the Graph Attention Networks. Admittedly, this was new territory for me. To navigate it, I leveraged instructional videos from the PyTorch Geometric site and am grateful to my mentor for supplementing this with additional resources. The journey wasn't without its challenges. I encountered numerous errors, but with my mentor's guidance and diligent research, I overcame them. The outcome? The GAT model slightly outperformed the GCN, yielding an MSE score of 916 against the GCN's 957. This progress was gratifying. While it's a promising start, I recognize there's potential for further enhancement, which I'm keen to explore post-GSoC.

In my last week, I dedicated time to work on my final report which was meant for evaluation. Balancing this with preparations for the upcoming Indaba conference was indeed a challenge. However, I'm pleased to share that I managed both in stride. For those curious about the details of my GSoC final report, you can check it out at this link:

https://summerofcode.withgoogle.com/programs/2023/projects/NBZn0Zm3

Finally,

The dataset was merged and integrated into the PyG-provided datasets which is now available for everyone to access. If you want to use it without writing code, you can download it from the Zenodo link where it is stored as brca_tcga.zip: https://zenodo.org/record/8251328. The image below also shows it on the PyG documentation site.

You could also use it as any other dataset following the guidelines provided by PyG tutorials. A sample is shown below:

Summary of Progress Made

  1. Submitted a pull request to PyG, which was approved and merged.

  2. Completed and submitted my final report, which passed the evaluation.

  3. Developed a GNN model using Graph Attention Networks specifically GATConv.

In Conclusion,

I'm thrilled with the outcomes of my GSoC project and the wealth of knowledge I've amassed over the past three months. The insights gained are priceless, and words fall short of conveying my deep appreciation to my mentors, the NRNB organization, and especially Google for bestowing this remarkable opportunity upon many like me.

This journey introduced me to new technologies, fostered my contribution to open source, and, notably helped me to curate a dataset for the bioinformatics community—a feat I find particularly exhilarating. My endeavors also took a tangible form when I showcased my project at two significant events: AI Bootcamp 2023 and Deep Learning Indaba 2023. I'm proud to share that at both forums, my presentations clinched awards, with the AI Bootcamp recognizing mine as the best poster!

Looking forward, my aspirations include refining my project outcomes, intensifying my contributions to open source, penning more technical blogs, and broadening my footprint in the AI domain.

(P.S. If you are interested in knowing more about my project, feel free to check it out on GitHub. https://github.com/cannin/gsoc_2023_pytorch_pathway_commons)

Thanks to everyone who has joined me on this journey during the past 3 months.