Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
New Genes for Eye Colour (kcl.ac.uk)
41 points by gmays on March 12, 2021 | hide | past | favorite | 26 comments


> In addition, the team found that eye colour in Asians with different shades of brown is genetically similar to eye colour in Europeans ranging from dark brown to light blue.

This was first proposed in 1978 by Crystal Gale, who also expressed every known gene for hair length then.

https://www.youtube.com/watch?v=C9lz_yzrGZw


because the author of this article didn't cite the source: https://advances.sciencemag.org/content/7/11/eabd1239


I'm curious why this is only being discovered now. Haven't we had, for decades, databases with people's eye color and genome? Can't you just run a regression and get a list of genes correlated with color?

I'm sure the explanation for why it's not trivial would be interesting.


I'm not sure if this apply for human genome, but virus like covid compress many protein in the same sequence. The ribosome iterates over the same genes, starting each new iteration at i+1. This process runs a finite amount of times, producing a different protein every time. Not trivial to compare.


I'm not sure that happens with human genomes, either. I tend to think it's unlikely, simply because it seems improbable that human genomes experience any meaningful selection pressure for information density.

Virus particles are typically very small, and viral genomes thus need to cram whatever they require into as little space as possible - being more information-dense means being able to fit more genes and thus more functionality, which I should think would be a pretty significant adaptive advantage.

Human cell nuclei, by contrast, are quite roomy - by a virus's standards, practically palatial, given that the largest known viral genome is about 0.03x the size of a human one. As it is, all our chromosomes fit into the nucleus with a good deal of space left over, and still could do even if considerably larger than they actually are right now. So, while a mechanism like you describe certainly could evolve, it'd be under no special pressure to do so.


Something like this happens in humans.[0] Eukaryotic gene expression is quite complex.

0. https://en.m.wikipedia.org/wiki/Alternative_splicing


That seems to be a different process, occurring at a different stage of expression - the link describes different ways eukaryotic mRNA can be spliced prior to translation; grandparent comment refers to a detail of how the translation process itself works with viral mRNA.

Specifically, that reference appears to be to one of apparently several mechanisms of non-canonical translation in viral RNA [1]:

> The focus is on the different translational strategies that RNA viruses employ for accessing multiple ORFs in mRNAs.

An "ORF", or "open reading frame", is a translatable sequence of bases, usually understood to begin immediately after a common "header" sequence at the start of an RNA strand. But viral genomes appear to have not even just one trick, but a whole bag of them, for getting host ribosomes to translate the same genome in multiple ways, as grandparent commenter describes. Which is really neat!

Also, it looks like I should've done some literature review before my own prior comment. Per [2], ribosome profiling - a technique which may be newer than my own experience in the field [3], and of which I was in any case unaware - has been used to show that eukaryotic RNAs likewise can and perhaps frequently do encode multiple ORFs which are routinely translated. Which is also really neat!

[1] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3542737/

[2] https://academic.oup.com/nar/article/44/1/14/2499627

[3] What appears to be the seminal paper for modern ribosome profiling [4] was published in early 2014, more or less at the same time I left the organization where I might have heard about it.

[4] https://pubmed.ncbi.nlm.nih.gov/24468696/


It doesn’t answer your question but Sean Carol did a very interesting podcast interview with the biologist Michael Levin recently talking about basically this subject.

My layman’s summary of it is that the way genes express themselves is dependent on the surrounding environment so DNA alone doesn’t define how a single cell or larger organism will develop.

Again as a layman I’d guess that interaction effects, false correlations, the p>n problem (more predictors than data points) etc would make finding any relations a lot harder than just running a regression.


> My layman’s summary of it is that the way genes express themselves is dependent on the surrounding environment so DNA alone doesn’t define how a single cell or larger organism will develop.

Epigenetics is the study of how the environment affects gene expression (i.e. turning genes 'on' or 'off'), and it's a relatively new field.

https://www.cdc.gov/genomics/disease/epigenetics.htm

So yes, that's a level of additional complexity, and we're now also learning more about how our microbiome and virome affects our biology as well, which I'm sure will lead to even further complexity...

https://en.wikipedia.org/wiki/Human_microbiome

https://en.wikipedia.org/wiki/Human_virome


It's mainly down to availability of computing resource. This paper [1] describes a "genome-wide association study", which is a complex phrase for a complex method of comparing many genomes with one another. Put very simply, it involves comparing each genome with every other one in the study set, finding variations they have in common, and correlating those variations with the trait that's of interest in the study.

In terms of computational complexity, this is of course extravagantly exponential, and the length of the human genome, about 30 megabases, makes it more expensive still. It's not just a simple comparison, either. There are many regions of the genome which will be identical or nearly so between individuals, because they code for the same things. But there's no guarantee that they'll be in the same place across two individuals' genomes, both because DNA doesn't exactly work that way, and because DNA sequencing doesn't either. So, before you can perform the comparison at the core of GWAS, you have to find and line up these common subsequences. This is called "local sequence alignment"; it is in itself quadratic, and you have to do it for every pair of genomes. So the total process, counting all that and the various ancillaries involved, is (n^2)^n plus a bit - a complexity class which, to my knowledge, no computer scientist has thus far dared to name, or not in printable terms at least.

Given this enormous requirement for both CPU and storage, it should come as no surprise that the first successful GWAS [4] was published only as recently as 2002, and that it was only about ten years ago that the technique really became feasible to deploy on a wide scale. Even in so short a time, though, it's proven an almost fantastically fecund field of study; to call it an almost fundamental revolution in the study of biological inheritance really isn't too strong a description, and I'm really looking forward to seeing what researchers develop with it over the course of the next couple of decades.

It's reasonable to wonder, specifically, why this technique should be able to tell us so much. Some traits - whether beneficial, harmful, or neutral - are heritable at the genetic level, i.e., in entire genes. These are currently described as "Mendelian traits" or, when not innocuous, "Mendelian disorders". Because those require only sufficient resolution to identify the presence or absence of specific genes, they were identifiable prior to the wide availability of GWAS. The variations identified by GWAS, by contrast, can be and often are as small as a difference in a single base (i.e. A instead of G, C instead of T), hence the frequently encountered term "single-nucleotide polymorphism" or SNP (pronounced 'snip'); NCBI has a database [2] of about three-quarters of a billion known SNPs, and about a hundred thousand of those [3] are considered likely to have clinical significance - that is, they're significantly more likely to be found in the genome of people who have, or go on to develop, some illness, to the point where awareness of their presence may usefully inform treatment of a patient who has them.

It's important to note that these aren't genetic variations, but genomic; that is, while SNPs can and frequently do occur within genes, it's the study of the genome as a whole in which they are identified. They also aren't mutations; where a mutation is uncommon and generally has an observable effect on the phenotype of the organism in which it appears, SNPs are highly common and typically don't have any direct effect on the phenotype. That's why, of the about 760 million SNPs in the NCBI database, only about a hundred thousand are individually listed as clinically significant; typically it's with a combination of SNPs, rather than a single one, where a correlation is found.

This relatively weak signal is why such a complex, statistical method is required to identify these variations in the first place: mostly they don't affect the individual whose genome displays them, and even when they do, it's often only in a very subtle way, such as elevating or reducing risk of some disorder - something I actually talked about here fairly recently [6], discussing the results of a GWAS investigating possible associations with SIDS risk - in the event, the study found that there are some genomic associations with SIDS, but that they represent so slight an adjustment in actual risk that it's not reasonably possible even to evaluate that risk delta in the case of an individual infant.

This is solidly typical of the sort of results you find in a given GWAS paper, and it militates strongly against considering any individual paper as dispositive of much of anything in advance of close reading. For example, I recently addressed a concern here over whether a sizable (~80k) GWAS examining associations with sexuality, specifically with likelihood of being either heterosexual or not boring, could potentially lend itself to selective abortion or some other sort of intervention - shades of the old "gay gene" debate from the 90s. I was happy to be able to explain that, not only can the GWAS results not be used that way, the GWAS results pretty conclusively disprove that anything even exists in the human genome which could be used that way. But it's easy to understand how someone would be concerned! I was too, before I read the paper; while it isn't likely that such a targetable complex of variations might exist, neither is it impossible. But if a GWAS of that size failed to find it, that's because it isn't there to be found.

Another common genre is studies in which SNPs are found to cause variation in the way already known genes are expressed - this isn't really surprising, although it may sound so; the interactions between DNA and the transcriptase which produces RNA from it, and between RNA and the ribosome which uses the encoded information to build proteins, are physical, and the physical structure of these nucleic acid molecules is strongly dependent on the bases in which the information they carry is encoded. These structures can, and do, affect the function of the proteins that interact with these molecules. The eye color paper that we're talking about here [1] describes such a result; to pick a specific example, it identifies a SNP in a non-coding region of DNA, adjacent to the genes which have the strongest effect on eye color, whose presence or absence influences the likelihood with which those genes are expressed (ie, affect the organism's phenotype), and thus influences the probability distribution of the eye colors which result. And, more broadly, this paper clarifies much that was previously unknown with regard to how eye colors, other than the already fairly well understood blue and brown, occur.

As it happens, I have hazel eyes, so this paper is pretty interesting to me! But in any case, I hope I've helped clarify some of the background for you, with regard to why and how papers like this come about, and how it can be that something as seemingly well-understood as the genomic origins of human eye color can still be open to so broad a clarification as this paper provides.

(I'm not a researcher myself; I just have friends and former colleagues in the field, and spent about a year working as a staff engineer in a genomics institute. Most of what I've just described, I picked up while I was there, and while I've done the best I could, my understanding may be both imprecise and outdated in ways which would dismay my erstwhile collaborators. If someone with more current or more accurate knowledge should happen by, I hope they'll take the time to set me straight wherever I need it!)

(Oh, also, it's not wise to put too much faith in "popularizers of science", whether that be Malcolm Gladwell or just the PR department of some university or other. They always screw it up. For example, "50 new genes for eye color" is just a straight-up lie - the paper neither identified nor sought to identify any new genes at all. I concede that a more detailed and accurate explanation, such as the one I've given, both requires considerably more effort to write, and considerably more effort to read; no doubt it would be less likely to be read and understood even if it appeared in place of the junk that's linked in this HN submission. What I don't understand is, if they're not going to try to actually educate anyone but just put out a bunch of nonsense that's of no use to anyone, why they even bother at all.)

(Well, that's a bit of a lie, too. They do it because they believe it will help get them grants. I've never understood how that is supposed to work; at the institute where I worked, grants happened because of grant writers, and the grant writers there spent as far as I know none of their highly valuable time on misleading laypeople with glib pop sci crap. So maybe I'm not so sure after all.)

[1] https://advances.sciencemag.org/content/7/11/eabd1239

[2] https://www.ncbi.nlm.nih.gov/snp/

[3] https://www.ncbi.nlm.nih.gov/snp/?term=%22pathogenic%22%5BCl...

[4] https://pubmed.ncbi.nlm.nih.gov/12426569/

[5] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3543921/

[6] https://news.ycombinator.com/item?id=26263294


Excellent answer but I will add that phenotyping eye color also adds noise - 23andMe relies on questionnaires and the users have some discretion in identifying their own eye color for example. And since the trait seems so polygenic you do need a ton of samples.

The computation complexity isn't as big a deal because the data here is genotyped not sequenced.


Disappointed that they mean discovered and not created.


Indeed. I thought that the future suddenly got more evenly distributed


I remember learning the eye diagram and thinking that it didn’t make sense. Part of my family is Syrian and I don’t think anyone has the same eye color. My mom is hazel/green, I’m dark red/brown, my kids are blue/grey, and their mom has green/grey eyes.

I just always assumed that the understanding of eye color was biased towards White, European people.


At what point can we select offspring, or engineer them, with specific hair and eye colour?


Come on that's the old way of thinking.

How can we express different eye colors, for our date, tonight? ;)


Coloured contacts...


Just take a bunch of mRNA shots.


You already could. The predictive power of the existing PGSes isn't too bad as it is; this improves it, but you could already steer the eye/hair color with considerable accuracy. The real problem is finding a service like GenPred which will do the embryo biopsies and can get out a usable genotype you can run said scores on.


Did they provide a list of variants in the paper?

If so, selection among embryos for the trait could be done now via IVF and Preimplantation Genetic Diagnosis.

Most companies wouldn't offer the service due to unwanted PR but any competent bioinformatician could do it given the data, I'm sure some would do it for bitcoin.

Modifying humans is prohibited and dangerous with current technology due to off target effects.


Let's hope never. I'm a fairly average looking person with shocking blue eyes that lets me swing above my weight.


Could mRNA ever cure my colour deficiency? Or do I misunderstand the biology?


Seems like the problem is the delivery vehicle.


mRNA doesnt alter your DNA, so no.


"The study, published today in Science Advances, involved the genetic analysis of almost 195,000 people across Europe and Asia."

Not to provoke or cause undue consternation, but:

sigh


Most probably the genome encoding is compressed. It will be interesting to find out with what kind compression algorithm it is compressed and how this algorithm works and how efficient and resilient it is - in comparison with other known compression algorithms... And changing the genome seems similar to changing the application executable binary files by writing specific bytes at specific positions in files - without understanding how the whole application binary works - at trying and seeing "what will happen" if we change this part of genome...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: