Hey,
One of the things that I became aware of when I was contacted by Aki to see if I am still interested in working on bayesplot as part of Google Summer of Code was that I needed to write regular diaries to document my journey. As I always think a little too much when it comes to how polished things look and how well they are organised, I decided to overhaul my personal website so that I can host a blog on it which I will be writing biweekly starting from today. After working on it to find a good looking template and make sure everything looks clean, I can finally start my Google Summer of Code journey with my first diary entry!
My Journey
Before writing this entry, I searched online to see what kind of diary entries people wrote and what did they mention to get some inspiration. For many people, it seems like the journey was about searching through hundreds of different organisations and tons of different projects to find the best one and then spending hours to prepare multiple proposals and finally receiving the good news: getting in. Don’t get me wrong, I also spent some good hours to prepare my proposal and I really put thought into the project, however, it didn’t start in the same way for me that it did for many other people.
I heard about GSoC couple months ago, when I was still looking for a summer internship. I was thinking of applying to JetBrains intership programme and since they had many different positions open, I was looking at the mentors as well. There, I saw a project mentor had an experience with GSoC when he was still a student. That caught my eye and I added GSoC to my job applications sheet. As the deadline for GSoC approached, I still didn’t have a summer internship so I decided to take a look at the projects offered here. Hovering around on the website, I came across many intersting project from projects that I would be happy to be part of. I also saw NumFocus, an umbrella organisation for many open source software projects, including the one that I am now part of, Stan.
I first encountered with Stan on Autumn 2024, when I started my master’s in machine learning at Aalto University and taking Bayesian Data Analysis from Aki Vehtari. There we were asked to use Stan in course assignments and the course project which I quite enjoyed. I enjoyed it so much that I will be a teaching assistant there for 2025 iteration. Since Aki and some of the course staff is also contributors of Stan, the instructions on Stan was pretty useful and I genuinely enjoyed using it. So, when I met with Aki to discuss being a teaching assistant at his course, I also asked about GSoC. That was the point where things actually took off for me. I had only a day and a half left to apply and by spending as much time as possible on the proposal, getting very useful help from Aki and lacking a sleep for a day, I managed to put together a well prepared proposal. After that, it was time to wait…
May 8 came when I was at Turkey, having an extended spring break that I planned hoping I would get a summer intership. When I booked tickets and made my plans to have that break, I was quite hopeful that I will have something for the summer but when May 8 came, I had nothing on my hands. I got rejected by more than 100 positions that I applied for at different stages, sometimes after multiple tests and interviews, sometimes without even talking to someone. GSoC was the only positions left unanswered yet. With those thoughts in my mind, I opened the email from Google:
Thank you for applying to be a Google Summer of Code 2025 contributor.
Unfortunately your proposal with NumFOCUS was not accepted. Every year we receive more contributor proposals than we are able to accept. We hope you will apply again in the future.
To be honest, I was dissapointed. I thought I had a good proposal and I had prior experience with Stan so I was thinking it was fairly logical that I would get picked for this. The reality was different, however. The project instead went to Teemu -who now helps me, I thank him so much! When I saw that, it made sense, to be fair. He had an published paper about bayesplot, of course he was more qualified that I was. With that, I started to get used to the idea of not having anything set for the summer and instead I would be working on my personal projects. That lasted only a few days, until when I got an email from Aki:
Hi Behram,
I guess you have noticed that GSoC did not fund you. We did get 2/3 proposals funded. However, if you are still interested, there might still be a chance. One student, who was selected, would like to drop out, and it might be possible to switch the project to you. Are you still interested?Aki
I was interested so much! I didn’t have anything set for the summer and I really enjoyed working with Stan so I got extremely excited. I and Aki had a quick call where he explained me the situation and we agreed to wait for the response from Google. As Google came positive about the decisions couple days later, my GSoC journey officially began! I am so excited to be part of Stan team and contribute to the development of bayesplot. It feels fascinating that I am part of the development of a software package that I was a user of couple months ago for my course project. I hope I can achieve all of my goals and have a very good journey with Stan, that doesn’t last with GSoC!
My Project
The bayesplot package is a popular tool for visualizing Bayesian analysis results in R. As part of the Stan ecosystem, it helps researchers understand their statistical models through clear graphics.1 This project will make the package even more useful by adding new visualizations, particularly for predictive checks of discrete and categorical outcomes. I’ll focus on creating intuitive plots that help users check if their models match real-world observations. Another important goal is improving documentation to make the package easier for newcomers to use and contribute to. These updates will help users of Stan framework to get their results in more easily presentable forms with new graphics and visuals. This, in addition to simple documentation, will help bayesplot continue serving the growing Bayesian community effectively.
Building upon the existing robust framework of bayesplot, the project will primarily focus on implementing the recommendations outlined in “Recommendations for visual predictive checks in Bayesian workflow” by Säilynoja et al.2 Here are list of visualisations that will be aimed to add to bayesplot library:
- Quantile dot plot: The quantile dot plot is a dot plot, where a set number of the observed quantiles visualized instead of the observations themselves. It visualizes the distribution of a variable by plotting dots at specific quantiles of the data. It is not included in the bayesplot, however, as it is part of the ggplot2 library with the function geom_dotplot, it can be implemented for the bayesplot library as well.
- Binned calibration plots: In these binned calibration plots, the binary observations are divided into a predetermined number of uniform bins, based on the event probabilities predicted by the model. Comparing the mean predicted probability to the observed event rate within each bin allows assessing the calibration, or reliability, of the model. At the moment, bayesplot doesn’t have a function to create binned calibration plots. However, it can be implemented by using ggplot functionalities.
- PAV-adjusted calibration plots: This is a more advanced calibration plot where the observed proportions are smoothed via isotonic regression (PAV algorithm) to visualize monotonic miscalibration. It is not included in the bayesplot at the moment. Though, it can be implemented by using reliabilitydiag and ggplot2 libraries.
- Residual plots: Even though bayesplot offers residual plots through its ppc_error_* group of functions, those functions are more geared towards continous data with styles like scattering. It is possible to improve that group of functins to handle discrete and categorical information better and create plots for wider range of data types. This is possible by using combinations of ggplot2 functions.
- Bounded KDE plots: When the observation density is smooth and unbounded, visualisations aimed at continuous distributions are used, data is generally represented correctly. However, when the visualisations do not meet the assumptions regarding the observation distribution, the situation gets problematic. That applies when bounded data is tried to be visualised by KDEs. Currently, bayesplot provides functions to visualise with KDEs, such as ppc_dens_*. However, those functions doesn’t handle alternative cases really well. Instead, it would be better to implement functions in bayesplot that takes care of the bounded data with correct boundary corrections methods.
- Rootograms: bayesplot offers its own ppc_rootogram function to plot rootograms. However, there is a need for upgrade for better visuals and putting more emphasis on discreteness of the data. There is a proposed visualisation by Säilynoja et al which can be implemented for bayesplot as well. With this update on bayesplot, rootograms would be better suited for its purposes.
Community Bonding Period
During the community bonding period, I was still in Turkey so our meetings happened online in general. I already met with Aki multiple times before but during this period, I had a chance to meet with other mentors of the project, Johah, Noa, and Teemu who moved from being the contributor to mentoring me in the project. We had a meeting to simply get to know each other but also discuss our goals and plans. I was also instructed about the workflow that is used at contributing to Stan. Given with full access to bayesplot repository, I became able to contribute to the project more easily and with less hassle. Apart from meetings, I dug deeper into the codebase during this period to have a better view of the repository and design my roadmap in more detailed way. Teemu created a GitHub project which will work as a project management tab during the GSoC period and we agreed to use GitHub issues to discuss items on our roadmap in more detailed way. With everything is set and agreed on, I am now ready to work on bayesplot and enjoy my time at GSoC.
See you in the next entry of GSoC diaries!
- Gabry et al. (2019). Visualization in Bayesian workflow. J. R. Stat. Soc. A, 182: 389-402. https://doi.org/10.1111/rssa.12378 ↩︎
- Säilynoja et al. (2025). Recommendations for visual predictive checks in Bayesian workflow. [Online]. Available: https://arxiv.org/abs/2503.01509 ↩︎