Hi everyone,
my research team is currently training a machine learning model to detect anomalies in data transmissions across satellite links.
To improve the model's accuracy, we need thousands of examples of both healthy and damaged files to use as training labels.
Manually breaking files one by one is simply not an option for a dataset of this massive size.
We really need a guide or a python library on how to generate corrupted test data files at scale. Any help from the data science community would be amazing. Thanks in advance.