Reproducibility and Experiments Workflows

2 minute read

This is my first post in a serie about my participation in the 1st Summer of Reproducibility at UCSC, 2023.

Since my early experiences in the field of science, reproducibility has always been essential. The scientific method mandates that any scientific achievement should be replicable by anyone, anywhere, provided they adhere to the same premises. This requirement ensures that there exists a causal relationship between causes and consequences. If a particular implication consistently holds true after being repeatedly subjected to experimentation, it can be considered a scientific truth.

Although much can be said about what reproducibility means, the ability to replicate results in day-to-day Computer Science experiments can pose a significant challenge for individuals, companies, and labs. This challenge becomes even more pronounced with the emergence of analytics and IA, where scientific methodologies are extensively applied on an industrial scale beyond the limits of academia. Reproducibility now assumes a key role in productivity and accountability expected from Data Scientists, Machine Learning Engineers, and other roles engaged in ML/AI projects.

Experiments

In the day-to-day, the pitfalls of non-reproducibility appear at different points of the experiment lifecycle. These challenges arise when multiple experiments need to be managed for an individual scientist or across a team of scientists. In a typical experiment workflow, reproducibility appears in the needs of dataset’s provenance, needs in managing changes on hypothesis tests, going through managing system hardware and OS, and dealing with outputs of model instances. In academic environments, these issues can result in mistakes and inaccuracies. In companies, they can lead to inefficiencies and technical debts that are difficult to address in the future.

Based on my experiences in both industry and academia, I became intrigued by the issue of reproducibility. This was my first motivation for participating in the Summer of Reproducibility. I wanted to learn more about this subject while contributing to open-source initiatives proposing to tackle this problem.

This is my introductory post about my Summer Journey in Reproducibility. In my next post, I will write about the concept of reproducibility and discuss interpretations found in current scientific literature.

comments powered by Disqus