[2001.04230v1] Considering discrepancy when calibrating a mechanistic electrophysiology model
A large benefit of the calibration methods which include discrepancy is that they better represent uncertainty in predictions, although all the methods we trialled still failed to allow for a wide enough range of possible outputs in certain parts of the protocols

Abstract Uncertainty quantification (UQ) is a vital step in using mathematical models and simulations to take decisions. The field of cardiac simulation has begun to explore and adopt UQ methods to characterise uncertainty in model inputs and how that propagates through to outputs or predictions. In this perspective piece we draw attention to an important and under-addressed source of uncertainty in our predictions — that of uncertainty in the model structure or the equations themselves. The difference between imperfect models and reality is termed model discrepancy, and we are often uncertain as to the size and consequences of this discrepancy. Here we provide two examples of the consequences of discrepancy when calibrating models at the ion channel and action potential scales. Furthermore, we attempt to account for this discrepancy when calibrating and validating an ion channel model using different methods, based on modelling the discrepancy using Gaussian processes (GPs) and autoregressive-moving-average (ARMA) models, then highlight the advantages and shortcomings of each approach. Finally, suggestions and lines of enquiry for future work are provided.
‹Figure 1: A comparison of Ten Tusscher (Model T [1], blue) and Fink (Model F [2], green) kinetics. These currents are voltage clamp simulations under the same action potential clamp, shown on the top panels. Here only those currents with different kinetics are shown; the kinetics of INa, INaCa, and INaK are identical in both models. Two of the gates in ICaL are identical in the two models, one gate has a different formulation, and Model F has one extra gate compared to Model T. The two models use different formulations for IKr, different parameterisations of the kinetics for IKs and Ito, and different equations for IK1 steady state. Currents are normalised in this plot by minimising the squared-difference between the two models’ currents such that we emphasise the differences in kinetics rather than the conductances (which are rescaled during the calibration). Only ICaL shows what we would typically consider to be a ‘large’ difference in kinetics, with the rest of the currents apparently being close matches between Model T and Model F. (A motivating example of discrepancy)Figure 2: Model F fitting and validation results. (Top) The model is fitted to the ground truth synthetic data (generated from Model T), using a five action potential recording under a 1 Hz pacing protocol. The calibrated Model F (blue dashed line) shows an excellent fit to the data (grey solid line). (Bottom) Model F validation and prediction for a different context of use (CoU). (Left) The calibrated Model F can match some validation data (2 Hz pacing) very well, giving us a (false) confidence in the model performance. (Right) Notably, despite the calibrated Model F giving an excellent fit and validation, it gives a catastrophic (posterior sample) prediction for the IKr block (CoU) experiments (suggesting in this example (Left) is not a good validation experiment!). The posterior predictions are model predictions simulated using the parameter samples from the posterior distribution (Figure ??); here, 200 samples/predictions are shown. (Model calibration)Figure 3: Marginal posterior distribution of Model F parameters, in terms of scaling factors for the conductances in Model T (si = gModel F i /gModel T i ). Values of 1 would represent the parameters of Model T that generated the data; notably, none of the inferred parameters for Model F is close to a value of 1. The red dashed lines indicate the result of the global optimisation routine. Note that two of these parameters, SKs and SNaK, have distributions hitting the lower bound that was imposed by the prior, indicating that the calibration process is attempting to make them smaller than 10% of the original Model F parameter values. (Discrepant model predictions)Figure 4: A cartoon to illustrate the effect of model discrepancy on parameter fits in different models. Each cloud represents a range of possible outputs from each model, which they can reach with different parameter values. The true data generating process (DGP) lies outside either of our imperfect model classes 1 and 2, and neither can fit the data perfectly due to model discrepancy. When we attempt to infer parameters, we will converge upon models that generate outputs closest to the true DGP under the constraint of being in each model. Adding more data just increases the confidence in being constrained to model parameterisations on the boundary of the particular model. This, however, does not move the model parameters closer to the true parameters used in the data generating process, i.e. we become certain about fθ∗ , the model using the pseudo-true parameter value. Note that a different experiment will lead to a different model output, shifting these clouds to a different shape, and then different parameter sets may give outputs that are closer to the data generating process. (A statistical explanation of the issue)Figure 5: Markov model representation of Models A, B, and C used in the ion channel model tutorial where Model C is taken as ground truth and used to generate synthetic data whilst Models A and B are candidate models that we attempt to fit and use for predictions, demonstrating both the model discrepancy and model selection challenge. (Ion channel model example)Figure 6: (Top) Models A (blue) and B (orange) fitting results for the ion channel model example. The two models are fitted to the same synthetic data (grey) generated using a third model, Model C, with i.i.d. Gaussian noise added to it. The voltage clamp protocol for calibration is the sinusoidal protocol [10]. (Bottom) Models A (blue) and B (orange) validation results for the ion channel model tutorial. We show predictions for the two fitted models for the unseen ground truth (grey), generated from Model C under the staircase protocol [11]. Note that there are significant discrepancies around 12 000 ms. (Standard calibration ignoring model discrepancy)Figure 7: Model A inferred marginal posterior distributions for the conductance, g in Eq. (??), and kinetic parameters p1, . . . , p8 (a list of parameters referring to Ai,j and Bi,j in Eq. (??)) with different discrepancy models: i.i.d. noise (blue), GP(t) (orange), GP(O, V ) (green), and ARMA(2, 2) (red). (Calibration with model discrepancy)Figure 8: Model A fitted to the sinusoidal calibration protocol using the different discrepancy models: i.i.d. noise, GP(t), GP(O, V ), and ARMA(2, 2). The plots show the mean (solid lines) and the 95% credible intervals (shaded) of the posterior predictive for each model. (Calibration with model discrepancy)Figure 9: Models A prediction with different discrepancy models: i.i.d. noise, GP(t), GP(O, V ), and ARMA(2, 2). The voltage clamp protocol for calibration is the staircase protocol [11]. We plot the posterior predictive with the mean (solid lines) and the bounds showing the 95% credible interval (shaded). The red arrows point to the tail current after the two activation steps which mark a visible area of model mismatch when calibrated without model discrepancy (i.i.d. noise, blue), and how the GP(O, V ) and ARMA(2, 2) models handle the mismatch differently. The blue arrow points to an obvious artefact at ∼ 7000 ms induced by the GP(t) prediction which was trained on the sinusoidal protocol, and doesn’t know anything about this staircase protocol. (Calibration with model discrepancy)