COVID-19 Open-Source Helpdesk

Forcasting for diagnosis test


I am a medical biologist and a bioinformatics specialist.
I can have diagnosis test results for covid19 updated each day.

sample1 metadata positif
sample2 metadata positif

I wonder if I can make forcasting of how many sample I will have next days based on this dynamic data. For instance, a bayesian model or something else.

Buy the way, if you have other ideas of analysis with this kind of per day data, you are welcome to share

1 Like

Thanks for asking @dridk. I’m sure folks will chime in who can help. Tagging @ananelson @luizirber @fperez too.

Hi, @dridk!

Can you give us a little more information? What format is this data arriving in? Does your example represent rows in a CSV file, or something else?

Do you need help importing, parsing, or collating this data?

And to clarify your question, are you looking to predict, based on the number of total samples, how many total samples you should expect to receive tomorrow?


I just received my first raw data from PCR LightCycler (It is a xml file ). Let me first check how it looks like . I will come back with new question I guess.

But yes, predicting total sample for next days ( tomorrow ) was my first idea.

This question might be a better fit for


If you need help parsing the XML - making the data easier to use - converting it to another format - that’s definitely something we can help with here! I can write a script/package to help automate this, and get the data into any useful form. If those tools aren’t already available.

I agree with @tacaswell that the prediction question would be best answered in another forum.


Without any other intuition or explanatory variables the best estimate for tomorrow is the mean of the number of daily samples. Do you have any intuition about the sample numbers (ie they’ll be increasing, they’ll lag the reported deaths, etc) or extra possible explanatory variables? Do you expect the change to be smooth - ie will tomorrow be broadly similar (in some sense) to today?

We can certainly have a stab at a first order prediction here; like Barry wrote using simple means or medians. It would be useful to have a look at (a sample of) your data, though; or at least to see a few example time series to get a feeling for what’s possible. Based on that we would also be able to give more detailed advice in general.

Thanks all for your answer and sorry for the delay of my reply.

I was doing a static web page to visualize PCR results of covid analysis from the XML file.
You can see a preview here : to have an idea of the data I get every days. I can put the source code (python notebook ) on gitlab if you want.
This web page correspond to one analysis ( 43 samples ) from one run of a LightCycler.

As you said, I will performs several days analysis to see how the number of positif sample increase.
By the way, other colleagues asked me to compute of probability for the positivity of a sample when a low signal appears.

So, let me see my colleagues tomorrow to understand what they want exaclty.


For bayesian model, @martinmodrak and @paul-buerkner can probably help.

I can definitely help with the technical part of setting up the model, but less with the theoretic background on tests. Also it looks like any predictions on positives would depend on epidemiological forecasts, which I have no idea how to do. But if you have a mechanism in mind, I can certainly help you to express that.

1 Like

If I understand correctly, you’ll need to define conditional probabilities, for example:

p(test positive | infected)
p(test positive | not infected)
p(infected | patient with symptoms)
p(infected | patient without symptoms)
p(infected | patient with travel history)

If you can estimate these probabilities based on past observations, you can then compute
p(infected | history) = p(infected, test_positive | history) + p(infected, test_negative | history)
= ( p(test positive | infected) + p(test negative | infected) ) * p(infected | history)

You could use epidemiological models (e.g. SEIR) to forecast p(infected | history). To do a good job however, you probably would need to have some information about the prevalence of other diseases that cause the same symptoms. Indeed, if COVID-19 has unique symptoms, then having those symptoms is informative of being infected. If another common disease (e.g. flu) has the same symptoms, then having those symptoms could mean you have the flu, and not COVID19. So p(infected | symptoms) depends on how common flu is, not just COVID-19.

IANAE (I Am Not An Epidemiologist)