genes are so necessary for triggering the immune system, that we are able to use these genes to foretell an individual’s immune response. Right here I’ll show tips on how to estimate illness charges simply from immune gene frequencies. All of the steps from getting the immune gene information, to figuring out excessive danger international locations, and assessing limitations of the mannequin are mentioned and the complete code is out there at github.com/DAWells/HLA_spondylitis_rate.
HLA genes are related to an individual’s response to an infection, vaccination, and infrequently very strongly linked to autoimmune illnesses. So strongly linked the truth is, that in massive teams we are able to predict illness charges from HLA gene frequencies. HLA frequencies are broadly studied and so typically accessible, permitting us to estimate charges of autoimmune circumstances which can be lacking or inaccurate as a result of challenges of analysis. On this submit we’ll mix research to generate correct estimates of immune gene frequencies and use these to foretell nationwide charges of ankylosing spondylitis.
allelefrequencies.net is a database of human immune gene frequency information from internationally which is an open entry, free and public useful resource (Gonzalez-Galarza et al 2020). Nonetheless, it may be tough to obtain and mix information from a number of initiatives; this makes it arduous to benefit from all this information. Fortunately HLAfreq
is a python bundle which makes it straightforward to get the newest information from allelefrequencies.internet and put together them for our evaluation. (Full disclosure, I’m one of many authors of HLAfreq!).
Ankylosing spondylitis is a type of arthritis, and 90% of sufferers have a particular model of the HLA B gene. To get the frequency of this model in numerous international locations, I downloaded all accessible frequency for this gene and mixed research of the identical nation, weighting by pattern measurement. In short, the mix is predicated on the Dirichlet distribution and we are able to use a Bayesian strategy to estimate uncertainty too. Singapore is used for instance within the determine beneath (all figures on this article are generated by the creator). Completely different HLA-B gene variations (also called alleles) are proven on the y axis, with their frequency in Singapore on the x axis. Knowledge from the unique Singapore research are proven in color, and mixed estimates in black. I centered on the weighted common on this evaluation, which is proven by the black circles. HLAfreq additionally calculates a Bayesian estimate with uncertainty which is indicated by the black bars.
The code used to obtain, mix, and plot the HLA-B allele frequency information for Singapore is beneath.
# Obtain uncooked information
base_url = HLAfreq.makeURL(“Singapore”, customary="g", locus="B")
aftab = HLAfreq.getAFdata(base_url)
# Put together information
aftab = HLAfreq.only_complete(aftab)
aftab = HLAfreq.decrease_resolution(aftab, 1)
# Mix information from a number of research
caf = HLAfreq.combineAF(aftab)
hdi = HLAhdi.AFhdi(aftab, credible_interval=0.95)
caf = pd.merge(caf, hdi, how="left", on="allele")
# Plot gene frequencies
HLAfreq.plotAF(caf, aftab.sort_values("allele_freq"), hdi=hdi, compound_mean=hdi)
Now we now have the nationwide allele frequencies we are able to pair them with nationwide illness charges to check the correlation. I’ve used the illness charges reported in Dean et al 2014. I log reworked the illness fee to make it usually distributed so I may match an atypical least squares linear regression. As anticipated, there was a major optimistic correlation; international locations with increased frequencies of HLA-B*27 had increased charges of ankylosing spondylitis. The exception to this was Finland which had an unusually excessive frequency of HLA-B*27 however a middling fee of illness. I eliminated Finland from the mannequin as an outlier, a call which was supported by “statistical leverage”. (Leverage means this one level had too massive an affect on the general mannequin; we would like the mannequin to inform us about international locations on the whole not anyone nation particularly).
We are able to use our linear regression mannequin to foretell charges of ankylosing spondylitis in international locations the place we all know the HLA-B*27 frequency. This tells us that international locations like Austria and Croatia have excessive predicted ankylosing spondylitis charges. Utilizing these predictions will increase the variety of international locations with illness fee estimates from 16 to 52 and may help establish international locations that might profit from extra surveillance. On this planet map beneath, international locations with low identified or predicted charges of ankylosing spondylitis are plotted in blue and excessive charges in yellow. Nations with identified charges are outlined in black and people with predicted charges are outlined in cyan or orange. Cyan is used for international locations within the vary of our mannequin and orange is used for international locations outdoors our mannequin’s vary, see beneath for why that is necessary.

We ought to be cautious about predicting illness charges for international locations with HLA-B*27 charges outdoors of the vary of our mannequin. Of the 36 international locations we now have predicted illness charges for, 10 have HLA-B*27 frequencies increased or decrease than any nation we utilized in our mannequin. Due to this fact, we are able to’t ensure the mannequin will give correct predictions for these international locations. Particularly, predictions could also be unreliable for international locations with excessive HLA-B*27 charges, we already know that Finland didn’t match our mannequin. This might be due to a non-linear development however we would not have sufficient information to discover these excessive frequencies.

The international locations with identified illness charges are plotted with stuffed factors. Finland which was omitted from the mannequin is plotted in pink. The expected illness charges are plotted as open circles, cyan for international locations within the mannequin’s vary and orange outdoors of it. The arrogance intervals of the mannequin are proven as dashed traces, and the prediction intervals are proven as a gray ribbon. A fast reminder in regards to the distinction: we anticipate the true relationship to fall throughout the confidence intervals 95% of the time, and we anticipate 95% of information factors to fall throughout the prediction intervals.
It’s value taking a second to remind ourselves that regardless of this correlation, there are numerous different elements influencing illness charges. Clearly a person’s likelihood of growing ankylosing spondylitis can also be impacted by their setting and different genetic elements. So if we needed actually correct illness fee predictions we would wish contemplate these different variables. However given how straightforward it’s to get HLA frequency information, it’s a fairly spectacular predictor for a illness that may take years to diagnose.
Conclusion
HLA genes have a robust impression on human well being by means of an infection, vaccination, autoimmune illnesses, and organ transplants. Due to these sturdy relationships, we are able to use broadly accessible HLA frequency information to check these well being traits not directly. Sources like allelefrequency.net and HLAfreq make it simpler to check these relationships, both by taking a look at these correlations instantly or utilizing allele frequencies as a proxy when different information is lacking. I hope this submit has bought you fascinated with inquiries to ask utilizing HLA frequency information.
References
Gonzalez-Galarza, F. F., McCabe, A., Santos, E. J. M. D., Jones, J., Takeshita, L., Ortega-Rivera, N. D., … & Jones, A. R. (2020). Allele frequency internet database (AFND) 2020 replace: gold-standard information classification, open entry genotype information and new question instruments. Nucleic acids analysis, 48(D1), D783-D788.
Dean, L. E., Jones, G. T., MacDonald, A. G., Downham, C., Sturrock, R. D., & Macfarlane, G. J. (2014). International prevalence of ankylosing spondylitis. Rheumatology, 53(4), 650-657.
Wells, D. A., & McAuley, M. (2023). HLAfreq: Obtain and mix HLA allele frequency information. bioRxiv, 2023-09. https://doi.org/10.1101/2023.09.15.557761