Hey,
One of the things that I became aware of when I was contacted by Aki to see if I am still interested in working on bayesplot as part of Google Summer of Code was that I needed to write regular diaries to document my journey. As I always think a little too much when it comes to how polished things look and how well they are organised, I decided to overhaul my personal website so that I can host a blog on it, which I will be writing biweekly starting from today. After working on it to find a good-looking template and make sure everything looks clean, I can finally start my Google Summer of Code journey with my first diary entry!
My Journey
Before writing this entry, I searched online to see what kind of diary entries people wrote and what they mentioned to get some inspiration. For many people, it seems like the journey was about searching through hundreds of different organisations and tons of different projects to find the best one and then spending hours preparing multiple proposals and finally receiving the good news: getting in. Don’t get me wrong, I also spent some good hours preparing my proposal, and I really put thought into the project; however, it didn’t start in the same way for me that it did for many other people.
I heard about GSoC a couple of months ago, when I was still looking for a summer internship. I was thinking of applying to the JetBrains internship programme, and since they had many different positions open, I was looking at the mentors as well. There, I saw that a project mentor had an experience with GSoC when he was still a student. That caught my eye, and I added GSoC to my job applications sheet. As the deadline for GSoC approached, I still didn’t have a summer internship, so I decided to take a look at the projects offered here. Hovering around on the website, I came across many interesting projects that I would be happy to be part of. I also saw NumFocus, an umbrella organisation for many open source software projects, including the one that I am now part of, Stan.
I first encountered Stan in Autumn 2024, when I started my master’s in machine learning at Aalto University and took Bayesian Data Analysis from Aki Vehtari. There, we were asked to use Stan in course assignments and the course project, which I quite enjoyed. I enjoyed it so much that I will be a teaching assistant there for the 2025 iteration. Since Aki and some of the course staff are also contributors to Stan, the instructions on Stan were pretty useful, and I genuinely enjoyed using it. So, when I met with Aki to discuss being a teaching assistant for his course, I also asked about GSoC. That was the point where things actually took off for me. I had only a day and a half left to apply, and by spending as much time as possible on the proposal, getting very useful help from Aki and lacking sleep for a day, I managed to put together a well-prepared proposal. After that, it was time to wait…
May 8 came when I was in Turkey, having an extended spring break that I planned, hoping I would get a summer internship. When I booked tickets and made my plans to have that break, I was quite hopeful that I would have something for the summer, but when May 8 came, I had nothing on my hands. I got rejected by more than 100 positions that I applied for at different stages, sometimes after multiple tests and interviews, sometimes without even talking to anyone. GSoC was the only position left unanswered yet. With those thoughts in my mind, I opened the email from Google:
Thank you for applying to be a Google Summer of Code 2025 contributor.
Unfortunately your proposal with NumFOCUS was not accepted. Every year we receive more contributor proposals than we are able to accept. We hope you will apply again in the future.
To be honest, I was disappointed. I thought I had a good proposal, and I had prior experience with Stan, so I was thinking it was fairly logical that I would get picked for this. The reality was different, however. The project instead went to Teemu, who now helps me. I thank him so much! When I saw that, it made sense, to be fair. He had a published paper about bayesplot; of course, he was more qualified than I was. With that, I started to get used to the idea of not having anything set for the summer, and instead, I would be working on my personal projects. That lasted only a few days, until I got an email from Aki:
Hi Behram,
I guess you have noticed that GSoC did not fund you. We did get 2/3 proposals funded. However, if you are still interested, there might still be a chance. One student, who was selected, would like to drop out, and it might be possible to switch the project to you. Are you still interested?Aki
I was so interested! I didn’t have anything set for the summer, and I really enjoyed working with Stan, so I got extremely excited. I and Aki I had a quick call where he explained to me the situation, and we agreed to wait for the response from Google. As Google came out positive about the decisions a couple of days later, my GSoC journey officially began! I am so excited to be part of the Stan team and contribute to the development of bayesplot. It feels fascinating that I am part of the development of a software package that I was a user of a couple of months ago for my course project. I hope I can achieve all of my goals and have a very good journey with Stan, which doesn’t last with GSoC!
My Project
The bayesplot package is a popular tool for visualising Bayesian analysis results in R. As part of the Stan ecosystem, it helps researchers understand their statistical models through clear graphics.1 This project will make the package even more useful by adding new visualisations, particularly for predictive checks of discrete and categorical outcomes. I’ll focus on creating intuitive plots that help users check if their models match real-world observations. Another important goal is improving documentation to make the package easier for newcomers to use and contribute to. These updates will help users of the Stan framework to get their results in more easily presentable forms with new graphics and visuals. This, in addition to simple documentation, will help bayesplot continue serving the growing Bayesian community effectively.
Building upon the existing robust framework of bayesplot, the project will primarily focus on implementing the recommendations outlined in “Recommendations for visual predictive checks in Bayesian workflow” by Säilynoja et al.2 Here are list of visualisations that will be aimed at adding to the bayesplot library:
- Quantile dot plot: The quantile dot plot is a dot plot where a set number of the observed quantiles is visualised instead of the observations themselves. It visualises the distribution of a variable by plotting dots at specific quantiles of the data. It is not included in the bayesplot, however, as it is part of the ggplot2 library with the function geom_dotplot, it can be implemented for the bayesplot library as well.
- Binned calibration plots: In these binned calibration plots, the binary observations are divided into a predetermined number of uniform bins, based on the event probabilities predicted by the model. Comparing the mean predicted probability to the observed event rate within each bin allows assessing the calibration, or reliability, of the model. At the moment, bayesplot doesn’t have a function to create binned calibration plots. However, it can be implemented by using ggplot functionalities.
- PAV-adjusted calibration plots: This is a more advanced calibration plot where the observed proportions are smoothed via isotonic regression (PAV algorithm) to visualise monotonic miscalibration. It is not included in the bayesplot at the moment. Though it can be implemented by using reliabilitydiag and ggplot2 libraries.
- Residual plots: Even though bayesplot offers residual plots through its ppc_error_* group of functions, those functions are more geared towards continuous data with styles like scattering. It is possible to improve that group of functions to handle discrete and categorical information better and create plots for a wider range of data types. This is possible by using combinations of ggplot2 functions.
- Bounded KDE plots: When the observation density is smooth and unbounded, visualisations aimed at continuous distributions are used, and data is generally represented correctly. However, when the visualisations do not meet the assumptions regarding the observation distribution, the situation gets problematic. That applies when bounded data is tried to be visualised by KDEs. Currently, bayesplot provides functions to visualise with KDEs, such as ppc_dens_*. However, those functions don’t handle alternative cases really well. Instead, it would be better to implement functions in bayesplot that take care of the bounded data with correct boundary corrections methods.
- Rootograms: bayesplot offers its own ppc_rootogram function to plot rootograms. However, there is a need for an upgrade for better visuals and putting more emphasis on the discreteness of the data. There is a proposed visualisation by Säilynoja et al, which can be implemented for bayesplot as well. With this update on bayesplot, rootograms would be better suited for the purposes.
Community Bonding Period
During the community bonding period, I was still in Turkey, so our meetings happened online in general. I already met with Aki multiple times before, but during this period, I had a chance to meet with other mentors of the project, Johah, Noa, and Teemu, who moved from being contributors to mentoring me in the project. We had a meeting to simply get to know each other, but also to discuss our goals and plans. I was also instructed about the workflow that is used to contribute to Stan. Given full access to the bayesplot repository, I was able to contribute to the project more easily and with less hassle. Apart from meetings, I dug deeper into the codebase during this period to have a better view of the repository and design my roadmap in a more detailed way. Teemu created a GitHub project which will work as a project management tab during the GSoC period, and we agreed to use GitHub issues to discuss items on our roadmap in a more detailed way. With everything set and agreed on, I am now ready to work on bayesplot and enjoy my time at GSoC.
See you in the next entry of GSoC diaries!
- Gabry et al. (2019). Visualization in Bayesian workflow. J. R. Stat. Soc. A, 182: 389-402. https://doi.org/10.1111/rssa.12378 ↩︎
- Säilynoja et al. (2025). Recommendations for visual predictive checks in Bayesian workflow. [Online]. Available: https://arxiv.org/abs/2503.01509 ↩︎