GSoC Diaries #7: Code Refactoring

Hey,

Sometimes you make some plans; however, life has some other plans. I believe that code refactoring is a canonical event for every developer and for every software package. Sometimes it is a complete overhaul, and sometimes it is just about rewriting a single function, like I did over the last two weeks. Just as I thought I am done with adding the new discrete style to ppc_rootogram, I was -reasonably- told that code I wrote is fairly messy and would be much easier to maintain if I refactor it, split it into two different functions and move data processing to this newly created internal function and change the logic of how visuals are drawn based on the arguments that user passed to the function. It made complete sense, to be honest. I am not going to be around all the time, and if the functionality that I added would create extra problems in the future, that’s a big issue. Therefore, I got to work and started refactoring the ppc_rootogram, effectively writing it from the ground up.

The issue with ppc_rootogram is that it has three different visual styles that the user can choose from: standing, hanging, suspended, and discrete. Out of these styles, the first three are fairly similar to each other. They use the same visual elements, such as histograms, lines, and filled areas, while the discrete one uses points and point ranges. They also have different shades of the chosen colour used to indicate different things, as well as different labelling and even different values on the y-axis. Especially having different values on the y-axis meant the data preparation for the discrete style is different from that for other styles. In standing, hanging, suspended styles, on the y-axis, the values are plotted with equal intervals, on a square-root scale. On the other hand, discrete style y-axis values are plotted with squared intervals on a square root scale. What I did at the end of the day was create an internal helper function named .ppc_rootogram_data to prepare the different data matrices needed for each style accordingly. This function is placed at the end of the ppc-discrete file and not exported, so it cannot be called by external users. Then, this function is called at the beginning of the ppc_rootogram rootogram to get the correct data matrix. After that, I created an if statement to create different geoms based on the style preference of the user. Then those geoms and other stylistic elements are combined in the final if statement that adds and manipulates the visuality of the graph according to the style that is being followed. This refactoring process made the code much more readable, and it got easier to maintain. It’ll also be easier to deprecate old styles or add new styles in the future. After doing some final bug fixing, the PR got approval from two reviewers, Teemu and Jonah, and I merged it into the main branch. This also marks the first merge action that I did on bayesplot! You know what, it felt good. I felt like a little kid who is let to play with big brother’s more complex toys. You can see that work here.

This was not all I did, I am glad to say. I also did some maintenance work. For a long time, it bothered me that there are two different functions, ppc_scatter_error_avg and ppc_scatter_error_avg_vs_x. These functions were doing pretty much the same work, except that the latter had an x argument, which is then plotted on the x-axis, moving the error values to the y-axis. Like I said in the last blog post, I was planning to combine these functions, so I went ahead and did that. It was simply checking if anything was passed to the optional x argument, and if yes, doing a few checks and passing it correctly to the helper function, which is doing the actual plotting. This didn’t take me long to implement; however, I am happy that I ticked another box and moved another task on my project board to the done tab. You can see that work here.

Next week is the final week of Google Summer of Code, so there are only a handful of things that I plan to do. First, I hope to complete the implementation of ppc_residual_scatter. It initially seemed quite complicated to me, but after discussing it with Teemu, we realised that there should be a relatively straightforward way to implement this. If I still have time after doing this, I’ll pick up some tasks from the open issues and implement them. I also need to write my final report and do my final submission to Google. See you sooner next time!