My original project goal was to develop machine learning techniques tailored to cell culture data; however, I knew that no company was going to grant me access to their biological data. The simplest solution would be to simply create my own, since I mostly understood the process well enough to simulate it. While any cell model I programmed wouldn’t be fully accurate, a ML technique that could learn that titer was proportional to cell volume (with cell diameter as an input) without being explicitly taught that would also learn other, similarly abstracted correlations. Additionally, the more similar the data was to real data (with assay errors, differing sample times, varying experimental conditions, and mistakes), the more one could be sure that any ML techniques developed on simulated data could be applied to real data with little hassle.
On the way, I ran into many gaps of my knowledge that forced me to consult literature. I wasn’t sure how to calculate pH given molarity of various acids and bases, and learned of the bicarb buffer system and the Henderson Hasselbalch equation. I learned that glucose consumption was cell-specific, while IGG production was proportiona to cell volume (and didn’t change depending on cell stage).
A roadblock was encountered with the discovery that some parameters changed rapidly. Since kLa could be in excess of 30/min, a step size of something as large as 1 minute would produce an inherently unstable system, swinging wildly at every step. While a step size of a fraction of a second would solve this, it could also create huge computational costs to run multiple simulations for weeks. An analytical solution was not an option, as cell metabolism linked almost every variable, and process controls could contain logical statements. I solved this using various methods which you can read about in detail here.
One of the biggest gaps in my (and probably everyone’s) knowledge in how to program such a system lies in the cell model, which we are still struggling to understand. I compartmentalized the cell model so that it could be easily swapped out for different models and still work with CC-Sim. A (relatively) simple cell model was developed to stand in.
There are many unanswered questions of how a cell should work. How does resource consumption change as osmolarity is increased and cell growth is inhibited? How about when metabolism is oxygen limited? Mostly, if I can’t find any literature, I plot my intuitions of how cell behave and come up with equations that describe it.
After writing most of CC-Sim and continuously checking online for how various problems could be solved, I realized that while many had been talking about how useful a system-level upstream bioprocess simulation could be, nobody had released one to the public. Since even a simple model would be useful, I decided to attempt to document and write about CC-Sim and release it free to the public so people had access to such systems and help to push the field further.