GSoC Diaries #4: New Arguments and Error Plots

Hey,

In the last blog post, I talked about quantile dot plots and doing my first official PR on the repository. Since then, that PR required a few updates and took me a couple of more days to finalise. One of my favourite things about GSoC is how much I learn about software development that I wouldn’t learn if it weren’t an open-source focused programme. For example, a small but very important thing I missed while doing my first PR was compiling the documentation. I thought writing up the documentation would be enough, but apparently, I needed to compile it explicitly so that it actually shows up on the website. After getting feedback from my mentors to fix such little mistakes, I managed to finish up the PR, and since it has been merged to the main branch, I am waiting to be included in the next release of the package. Congratulations to me, I guess!

As there are lots of things to implement as part of my project, I quickly started working on my next task. This time, I wanted to focus on another part of the bayesplot error/residual plots. In their paper, Säilynoja et al¹ highlighted the importance of residual plots that are robust in visualising both continuous and discrete data. So far, error plots in bayesplot are geared towards handling continuous data, which created a need for changes.

There have been already talks about error plots and whether we need alternative approaches for them, so I created a new issue to discuss those things, gathering two important tasks to implement: first, adding an optional x argument to ppc_error_binned function and second, implementing a new plot -possibly- named ppc_residual which would plot y – stat(y_rep) on the y-axis and stat(y_rep) on the x-axis. Currently, ppc_error_* functions are plotting stat(y – y_rep) on the y-axis and stat(y_rep) on the x-axis. Our discussion led us to think that implementing the second task requires detailed thinking, especially regarding how we want to structure the package. For example, it is an important question whether we want to create a new family of functions for ppc_residual_* or include them in the ppc_error_* family. Since we haven’t settled on this discussion, I decided to start working on something that I can implement right away: adding an optional x argument to the ppc_error_binned function.

To be honest, this task has been fairly easier than implementing a whole new function from scratch. As there are already helper functions to use, the task for me is just to change a handful of lines, update the documentation, and implement new tests to make sure the new abilities of the function work as intended. Though the most important part of this task was not breaking anything, because in a package that has quite a lot of users, the last thing you would want is messing with functions that people might be utilising in production code.

With a quick implementation phase, I managed to finish up the changes that were needed and created a new PR for them. This new code is passing all the tests, not breaking any existing code, and implementing the changes we needed just as described. Hopefully, this PR will quickly make its way to the main branch. I am quite happy with ramping up my shipping speed, and I hope to keep this up. I still haven’t settled on what I should implement next, but let’s keep it as a mystery for the next diary entry!

Säilynoja et al. (2025). Recommendations for visual predictive checks in Bayesian workflow. [Online]. Available: https://arxiv.org/abs/2503.01509 ↩︎