GSoC Diaries #3: First PR – Behram Ulukır

Hey,

In the last blog post, I mentioned that I have been working on the dot plots. You know what, it was much more complicated than I anticipated. If you remember, I told you that there are two options to use under-the-hood while implementing dot plots for bayesplot, geom_dotplot and stat_dots. While the former one offers a simple solution for the feature that I was trying to implement, the latter one has more advanced capabilities at the expense of adding new dependencies to the package. I didn’t want that. I was confident that I would be able to implement what stat_dots offers by tweaking geom_dotplot. A fault confessed is half redressed. After spending a significant amount of time implementing auto-sizing dots and stack ratio of dot plots by finding the number of the tallest line, I realised that I would need more work than just tweaking the geom_dotplot. I would need to implement a whole new geom and simply build what stat_dots offers from scratch. After consulting with Jonah, Aki, and Teemu, I have decided that using stat_dots is the better way to go.

We didn’t want to hard-introduce ggdist and instead, we added it as a suggested dependency to the package. On top of that, we added a check to the newly implemented ppc_qdotplot and ppd_qdotplot to see whether ggdist is implemented or not. The rest was relatively straightforward. By following similar functions that were already implemented, such as ppc_hist, I built ppc_qdotplot and ppd_qdotplot. They simply are functions that take inputs of Bayesian methods, format them nicely and turn them into graphs by passing them to the stat_dots function.

However, there is more to open-source software development than just implementing the functions. It needs to have detailed documentation as well as a sensible testing structure. I started with documentation simply because it was easier to do. Since bayesplot uses roxygen to create its documentation, I had to follow a certain structure while creating documentation. By following the existing documentation, I explained the new functions in detail and also created an example to showcase how it works. I also created tests for new functions. Bayesplot utilises the testthat library to build unit tests and has two main types of tests: the first one is to test that functions return correct types for different argument combinations, and the second type is to test whether the outputs of functions have changed since the last time that tests are run. To the second one, graphs are saved as SVG files, and they are compared with the outputs of the next test run. The only issue with this structure is that sometimes SVG files change without any visible change to graphics, which leads to false negatives. To implement these tests for ppc_qdotplot and ppd_qdotplot, I have followed the existing examples. The only addition I had was wrapping some of the test functions with expect_warning. This was because sometimes when dots don’t fit the graph, stat_dots throws an error, which is not subject to our unit tests and can be ignored in this context. While doing that, I have allowed those unit tests to ignore those warning messages.

All of these things that I have explained simply constitute the basis of my first PR to the bayesplot library. PR is still pending to be merged, and I will fix any issues based on the feedback of my mentors. I am happy with my work so far, and I feel like I am learning quite a lot, not just in terms of technical skills but also about general procedures of software development. If everything goes well with this function, my next plan is to work on discrete residual plots. Hope to explain that part in the next blog post!