* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

# Download article in press

Survey

Document related concepts

Heritability of IQ wikipedia , lookup

Medical genetics wikipedia , lookup

Pharmacogenomics wikipedia , lookup

Polymorphism (biology) wikipedia , lookup

History of genetic engineering wikipedia , lookup

Genetic engineering wikipedia , lookup

Genetic drift wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Gene expression programming wikipedia , lookup

Population genetics wikipedia , lookup

Dominance (genetics) wikipedia , lookup

Microevolution wikipedia , lookup

Transcript

ARTICLE IN PRESS JID: MBS [m5G;March 8, 2016;22:1] Mathematical Biosciences xxx (2016) xxx–xxx Contents lists available at ScienceDirect Mathematical Biosciences journal homepage: www.elsevier.com/locate/mbs A formulation of the foundations of genetics and evolution Brian Edward Bahr∗ Q1 16299 Dakota Shores Dr., Park Rapids, MN 56470, United States a r t i c l e i n f o Article history: Received 6 December 2014 Revised 13 February 2016 Accepted 17 February 2016 Available online xxx Keywords: Mathematical formulation Mathematical simulation Evolution Genetics Q2 a b s t r a c t This paper proposes a formulation of theories of the foundations of genetics and evolution that can be used to mathematically simulate phenotype expression, reproduction, mutation, and natural selection. It will be shown that Mendelian inheritance can be mathematically simulated with expressions involving matrices and that these expressions can also simulate phenomena that are modiﬁcations to Mendel’s basic principles, like alleles that give rise to quantitative effects and traits that are the expression of multiple alleles and/or multiple genetic loci. © 2016 Elsevier Inc. All rights reserved. 1 1. Introduction 2 Similar to the way that Newton’s formulation of the laws of motion can be used to mathematically simulate the trajectory of objects under the inﬂuence of forces, this paper proposes a formulation of the foundations of genetics and evolution that can be used to mathematically simulate phenotype expression, reproduction, mutation, and natural selection. This is not a new model of these phenomena but a mathematical representation of an organism with matrices that are acted on by functions designed to have the same effect on the representation as the biological processes listed above have on true organisms. Accordingly, each organism is represented by its own matrices and each matrix is operated on separately, which means that simulating a population of any signiﬁcant size demands a computer. Simulating these biological processes on paper is not as simple as using an equation that models them; however, a well-written computer program can make a simulation that is nearly as simple to operate. The main advantage of this formulation, though, is that we can observe the effects that each biological process has on the genotype and/or phenotype as a whole, which as we will see has several beneﬁts when simulating natural selection. We will also see that, with this formulation, we are not constrained to modeling non-overlapping generations, nor are we constrained to using ﬁtness values that are constant over time when we are simulating natural selection. The majority of this paper will involve exploring the effects of each function and what each function can simulate. Each section in which a function is introduced will be followed by an example 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 ∗ Tel.: +1 507 272 3874. E-mail address: [email protected] of the effects of that function on an organism’s representation, culminating in an example simulation of a small population involving every function that has been presented. Before the particular functions are presented, though, some notation needs to be introduced. 29 1.1. Notation 33 This formulation uses square diagonal matrices with entries from Zn (the ring of integers modulo n) where n will depend on the complexity of an organism’s phenotype expression (as we will see later). And the functions involved will operate on these matrices with matrix addition and multiplication (there will also be one action involving a calculation of the trace of a matrix). All variables in italics used in this paper represent integers, so it will be automatically assumed and not speciﬁed that they are integers whenever a new variable is introduced; likewise, all matrices will be represented by boldface variables and will not necessarily be speciﬁed as matrices when they are introduced. We will begin by distinguishing between two different matrix types. The ﬁrst type of matrix, the genotype matrix, will be used to represent the genotype of an organism; in particular, each position on the diagonal of a genotype matrix will represent one allele from that organism’s genotype (and genotype matrices can either be used to represent an organism’s total genotype or a section of it). The second type of matrix we will call a phenotype matrix. The phenotype matrix will initially be constructed from operations on an organism’s genotype matrices but, as we will see, its entries can also be altered by other functions. Accordingly, each entry along the diagonal of a phenotype matrix will represent one phenotypic trait that is either the expression of the organism’s genotype, the 34 http://dx.doi.org/10.1016/j.mbs.2016.02.005 0025-5564/© 2016 Elsevier Inc. All rights reserved. Please cite this article as: B.E. Bahr, A formulation of the foundations of genetics and evolution, Mathematical Biosciences (2016), http://dx.doi.org/10.1016/j.mbs.2016.02.005 30 31 32 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 ARTICLE IN PRESS JID: MBS 2 B.E. Bahr / Mathematical Biosciences xxx (2016) xxx–xxx Fig. 1. Expression gate. Fig. 2. Reproduction gate. 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 [m5G;March 8, 2016;22:1] result of environmental inﬂuences, or some combination of the two. The other matrices (which will be presented in later sections) exist in the environment space which is a system of paths and gates. Paths are simply the trajectory an organism must follow between gates (like a wire in an electric circuit); and gates contain functions that act on an organism’s matrices and also determine which matrices leave on which paths. Now, when a matrix is added to or multiplied with an organism’s matrix, this invariably produces a new matrix, so according to the deﬁnition of gates there are three things that can happen to this new matrix: either it leaves the gate it was created in on the same path as the matrices that entered the gate; it follows a different path than the matrices that entered the gate; or it does not leave the gate. Likewise, the matrices that entered the gate can either leave on the same path, follow different paths, or be prevented from leaving the gate. Thus we will singularize three different types of gates: expression gates, reproduction gates, and alteration gates. In an expression gate, an organism’s genotype matrices are operated on to create a phenotype matrix that leaves the gate it was created in on the same path as the organism’s genotype matrices (Fig. 1). In a reproduction gate, matrices are generated from operations on the organism’s genotype matrices which then follow a different path than the organism’s matrices (Fig. 2). And in alteration gates, one of the organism’s matrices will be operated on to produce a matrix that leaves the gate on the same path as the organism’s other matrices; however, the particular matrix that was operated on will not leave the gate (Fig. 3). Lastly, we will make the deﬁnition that an organism is any set of matrices that simultaneously enter or leave the same gate so that a matrix produced in a gate either becomes included in the set of the organism’s matrices, or it becomes included in the set of Fig. 3. Alteration gate. a new organism’s matrices. This means that expression gates create a phenotype matrix which becomes a part of the organism; reproduction gates leave the original organism’s genotype matrices unchanged but create matrices for a new organism; and alteration gates replace one of the organism’s matrices with a new matrix. In this manner, the action of having an organism enter an expression gate will be used in this formulation to simulate the biological phenomenon of phenotype expression; the action of having an organism enter a reproduction gate will be used to simulate the biological phenomenon of reproduction; and the action of having an organism enter an alteration gate will be used to simulate the biological phenomenon of mutation. One ﬁnal type of gate will be included to represent natural selection. A selection gate will contain a function that assesses the value of a certain entry in the organism’s phenotype matrix, and then uses that value to determine whether the organism leaves the gate; and the path leaving a natural selection gate will always lead to a reproduction gate or another natural selection gate. So the action of having an organism enter a selection gate will be used to simulate natural selection. We can see from the above deﬁnitions that an organism might, for example, be two genotype matrices and a phenotype matrix that simultaneously follow a path to a gate and then leave that vertex together on another path to simultaneously enter another gate, etc. Now, a true organism really has a genotype for each cell in its body and a set of genotype and phenotype matrices could conceivably be made for each cell in an organism, but for most cases, we probably only need to distinguish between an organism’s germ-line matrices and somatic matrices (matrices representing the genotype and phenotype of cells that contribute and do not contribute to gametes respectively). This distinction will come into play in the reproduction and alteration actions. From the deﬁnition of an organism it is also clear that a population of organisms must contain a collection of paths and gates for each individual organism since two organisms cannot simultaneously enter the same gate. Thus a population can be represented by a collection of paths through the environment space that sets of matrices follow (which is why a population of any signiﬁcant size demands a computer). An example involving a small population will be simulated in Section 6.1. 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 1.2. Constructing genotype matrices 133 One reason for choosing to use diagonal matrices in this formulation is so that we can use a mathematical operation that will act on the entire set of alleles in an organism’s genotype, but will act on any two alleles if and only if they are interactive alleles (alleles—from different genetic loci or from the same genetic 134 Please cite this article as: B.E. Bahr, A formulation of the foundations of genetics and evolution, Mathematical Biosciences (2016), http://dx.doi.org/10.1016/j.mbs.2016.02.005 92 135 136 137 138 JID: MBS ARTICLE IN PRESS [m5G;March 8, 2016;22:1] B.E. Bahr / Mathematical Biosciences xxx (2016) xxx–xxx 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 locus on a different homologous chromosome, etc.—that interact to express a single phenotypic trait). With diagonal matrices, we can accomplish this by making a requirement on the arrangement of alleles within genotype matrices: if two alleles are interactive alleles, then those alleles must be represented by entries in separate genotype matrices and they must be located in the same position in their respective matrices. Let us translate this rule into mathematics. Since diagonal matrices contain all zeroes except on the main diagonal, we can denote them as a = a1 , … , ak without loss of information; thus ai and aj represent non-interactive alleles for all i = j. And, for two entries from different matrices, say ax and by , they represent interactive alleles if and only if x = y. Now, when we recall the nature of matrix addition and multiplication on diagonal matrices, we can see that the requirement that ax and by represent interactive alleles if and only if x = y results in entries being added or multiplied together if and only if they represent interactive alleles when genotype matrices are added or multiplied together. Because, since all off-diagonal entries in a diagonal matrix are 0, these operations act as follows: a + b = a1 + b1 , a2 + b2 , . . . , ak + bk ab = a1 b1 , a2 b2 , . . . , ak bk 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 So, with genotype matrices under the interactive allele requirement, we have two mathematical functions that can operate on the entire set of alleles in an organism’s genotype and will act on two alleles if and only if they are interactive alleles. We should note, though, that this requirement means that even a haplotypic genotype might require more than one genotype matrix to represent it if at least one of its genes interacts with a gene at another loci to produce a single phenotypic characteristic. This also requires that, if a certain gene is involved in the expression of multiple phenotype characteristics, then that gene’s allele would need to be represented by multiple entries in genotype matrices. Yet we can see that any alleles that are not interactive alleles can be represented in the same genotype matrix. For example, if in a given pair of homologous chromosomes, every allele on one chromosome interacts with the allele at the same genetic locus on the other chromosome (and only that allele), then we would only need two genotype matrices to represent these alleles and could represent each allele from one chromosome in one matrix and each allele from the other chromosome in the other matrix. And if there were multiple chromosome pairs that ﬁt this pattern, their alleles could all be represented together in two matrices. Consequently, the minimum number of genotype matrices required to represent an organism’s genotype is determined by the maximum number of alleles that interact to express a single phenotypic trait. However, since diagonal matrices must be the same size to operate on each other, and since some alleles might interact with less alleles than other alleles do, there might be “empty” spaces in the “extra” genotype matrices that must be ﬁlled with an identity element (which are entries that do not represent an allele and, we will see in the next section, are either 1 or 0). To simplify this and later discussions, we will denote interactive alleles that are at the same genetic locus as interactive pairs; and we will denote alleles for which there is no other allele at the same genetic locus (like alleles on the Y chromosome) as single alleles. We will also denote genotype matrices as g = g1 , … , gk and index them as g1 , … , gh . Let us ﬁrst consider the case of a diploid organism whose genotype contains only interactive pairs (no singles) that do not interact with any other alleles (in other words, every trait is a Mendelian trait). According to the interactive allele requirement, gx i and gy j can represent interactive alleles if and only if x = y, so we need at least two genotype matrices. But since each interactive pair does not interact with any other alleles, we can represent all interactive 3 pairs in the same two genotype matrices. 202 g = g1 , . . . , ga g2 = g1 2 , . . . , gb 2 1 1 1 Let us suppose now that we have an organism whose genotype still contains only interactive pairs (no singles), but that some of them interact with other interactive pairs. This could be a diploid organism that contains traits whose expression involves the interaction of alleles at multiple genetic loci or perhaps a 2m-ploid organism of any value of m. In this case, we will need multiple pairs of genotype matrices, by the same reasoning as above. However, if some interactive pairs interact with more pairs than others, then since diagonal matrices must be the same size to operate on each other, there will be “empty” spaces in some of the genotype matrices, which must be ﬁlled with an identity element. For example, if just one of the interactive pairs interacted with one other interactive pair and we placed these pairs at g2 in each genotype matrix, then we could put every other interactive pair in the ﬁrst two genotype matrices and ﬁll the other two matrices with the identity element appropriate for that position (for now we will denote the identity element at position e as μe and we will see the reason for this choice in the next section). g = g1 , g2 , g3 , . . . , ga 1 1 1 1 1 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 g2 = g1 2 , g2 2 , g3 2 , . . . , ga 2 μ1 , g 2 3 , μ3 , . . . , μa g4 = μ1 , g2 4 , μ3 , . . . , μa g3 = We can see that this will not violate the interactive allele requirement, since these identity elements do not represent alleles, and also ensure that every genotype matrix is the same size. (Alternatively, we could separate alleles into more genotype matrices so that all interactive pairs that did not interact with any other interactive pairs are in two matrices; all interactive pairs that interact with one other interactive pair are in four matrices; etc. It is merely a matter of aesthetic preference, and how many matrices one wants to keep track of.) Finally, let us consider an organism whose genotype contains interactive pairs and single alleles, some of which interact. This could be a diploid organism that contains traits whose expression involves the interaction of alleles at multiple genetic loci, some of which might be on a sex chromosome; or perhaps an m-ploid organism for any value of m. This will be the same as the last case, except for the single alleles; however, we can make single alleles “interact” with an identity element so that this case will be exactly the same as the last one and the interactive allele requirement will still be satisﬁed if we do this (it will just mean that we need to pay a little more attention when using the reproduction action as we will see later). So, if we build on the last example, we might have something like the following (where certain entries have been suggestively labeled as x and y in order to show an example arrangement of alleles from X and Y chromosomes). g = g 1 , g 2 , g 3 , . . . , μa , . . . , μc , y 1 , . . . , y e 1 1 1 1 1 1 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 g2 = g1 2 , g2 2 , g3 2 , . . . , x1 2 , . . . , xb 2 , μd , . . . , μe g3 = g1 3 , g2 3 , g3 3 , . . . , μa , . . . , μc , μd , . . . , μe g4 = g1 4 , g2 4 , μ3 , . . . , μa , . . . , μc , μd , . . . , μe g5 = g1 5 , μ2 , μ3 , . . . , μa , . . . , μc , μd , . . . , μe g6 = μ1 , μ2 , μ3 , . . . , μa , . . . , μc , μd , . . . , μe And we can see from this that haploids can also be represented by pairs of genotype matrices. For example, a haploid organism whose genotype contains only single alleles that interact with no other alleles would simply have one matrix containing its alleles Please cite this article as: B.E. Bahr, A formulation of the foundations of genetics and evolution, Mathematical Biosciences (2016), http://dx.doi.org/10.1016/j.mbs.2016.02.005 203 204 247 248 249 250 ARTICLE IN PRESS JID: MBS 4 251 252 253 B.E. Bahr / Mathematical Biosciences xxx (2016) xxx–xxx and one matrix containing all identity elements. Or, if some of its single alleles interacted with one other allele, it might look like the following: g = g1 , g2 , g3 , g4 , g5 , g6 1 g2 = 254 255 256 257 258 259 260 261 262 263 264 265 1 1 1 1 1 1 μ1 , g 2 2 , g 3 2 , μ4 , g 5 2 , μ6 Consequently, we only need to consider cases involving pairs of genotype matrices because we can always pair a single allele with an identity element. Additionally, we might want to include “extra” pairs of genotype matrices containing only identity elements for each organism if we want to represent certain forms of chromosome mutation like duplication in the same way as the other forms of alteration. (We will see why in Section 4.) Furthermore, we might want to include “extra” identity elements in each genotype matrix so that we can add entries without having to change the size of any matrices. There are of course ways to change the size of each matrix, but this runs the risk of shifting entries in undesired ways. 266 2. The formulation 267 2.1. Phenotype expression 268 276 Phenotype expression involves having an organism that consists only of genotype matrices enter a phenotype expression gate which contains a function that acts on those genotype matrices to produce phenotype matrices; then these phenotype matrices leave the gate simultaneously on the same path as the genotype matrices. To simplify the following discussion, we will ﬁrst divide phenotype expression into two different types and then combine them into a general phenotype expression function. 277 2.2. Multiplicative phenotype expression 278 Let us ﬁrst investigate how to use this formulation to represent Mendel’s Law of Dominance—which, expressed rigorously, is the relationship between two alleles, A and , in which the interaction of A and A expresses the same trait as the interaction of A and —along with the ﬁrst half of the Principle of Segregation— that each phenotypic characteristic is the expression of two (interactive) alleles. Clearly the ﬁrst half of the Principle of Segregation limits us to just two genotype matrices and it is easy enough to see that these rules can be encapsulated mathematically as: 269 270 271 272 273 274 275 279 280 281 282 283 284 285 286 287 1 2 g g =p 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 [m5G;March 8, 2016;22:1] where g1 , g2 contain entries that are elements of Z2 . This is due to the fact that the only elements of Z2 are 0 and 1 and 0 × 0 ≡ 0 (mod 2) and 0 × 1 ≡ 0 (mod 2), so 0 can represent the allele denoted as A and 1 can represent the allele denoted as from above. Yet we can extend this idea further when we focus on a special subset of Zn which we will denote Жn . The set Жn will be deﬁned as {u Zn | uu = u}; in other words, Жn is the set of all multiplicatively idempotent elements in Zn . For example, the set Ж2 = {0,1}, as we just saw; the set Ж6 = {0,1,3,4} because 3 × 3 ≡ 3 (mod 6), 4 × 4 ≡ 4 (mod 6), etc.; and the set Ж30 = {0,1,6,10,15,16,21,25}. An interesting property of the elements of Жn is that for certain u, v Жn , uv = u. For example, the numbers 15 and 25 from Ж30 , exhibit this property since 15 × 25 ≡ 15 (mod 30) (and trivially any u multiplied by 1 is congruent to u). The Theorem of Dominance (proved in the Appendix) tells us that this is the case whenever the greatest common divisor of v and n also divides u. And it is also proved in the Appendix that there is an element of Жn for each unique combination of the distinct prime divisors of n, so the number of pairs of elements of Жn in which uv = u depends on the number of prime divisors of n. Now, since by deﬁnition uu = u for all u Жn , then for any u,v Жn in which uv = u, the phenotype expression action of genotype entries containing u and v can replicate the relationship between any two true alleles A and , in which the combination of A and A expresses the same trait as the combination of A and . Consequently, we will make the following shorthand deﬁnition in this formulation: given two entries u and v, if the result of multiplying uv equals u, then it will be said that u dominates v. And we can see that, if n contains enough prime divisors, then we can construct a dominance hierarchy where w dominates z, v dominates both w and z, u dominates both v, w and z, etc., which we will denote as u v w z. This can be achieved if w contains the greatest common divisor of z and n plus at least one other divisor of n, v contains the greatest common divisor of w and n plus at least one other divisor of n, u contains the greatest common divisor of v and n plus at least one other divisor of n, etc. For example, from the set Ж30 , there is the dominance hierarchy 0 15 25 1; and in the set Ж210 , there is 0 105 175 85 1. To provide a concrete example, we can represent the alleles that determine mallard duck feather pattern with the hierarchy 105 175 85 from Ж210 . The gene that determines feather pattern contains three alleles usually denoted MR , M, and md and the interaction of MR with MR produces the same trait (the restricted feather pattern) as the interaction of MR with M or md ; furthermore, the interaction of M with M produces the same trait (the mallard feather pattern) as the interaction of M with md ; and there is a third trait (the dusky feather pattern), that is only produced by the interaction of md with md [1]. Accordingly, if 105, 175, and 85 represent the alleles MR , M, and md respectively, then we can see that, because 105 × 175 ≡ 105 (mod 210) and 105 × 85 ≡ 105, the restricted trait can be represented by 105; and because 175 × 85 ≡ 175, the mallard trait can be represented by 175; while the dusky trait can be represented by 85. Before we proceed further, let us deﬁne two elements, a and b, of a ring as extraneously prime if a contains a prime factor of n that is not in b and contrariwise. So, for example, if n = 30, then 6 and 15 and 10 are all extraneously prime to each other and 16 and 21 are also extraneously prime to each other (but 15 and 25 are not). Clearly from this deﬁnition, for any u,v Жn that are extraneously prime, the greatest common divisor of v and n does not divide u and the greatest common divisor of u and n does not divide v, therefore, by the Theorem of Dominance, u does not dominate v and v does not dominate u so uv ≡ w (mod n) (where u ࣖ v ࣖ w). In this case, the phenotype expression action of genotype entries containing u and v can replicate the relationship between any true alleles A and B, in which the combination of A and B expresses a trait that is different from the trait expressed by A and A and the trait expressed by B and B; thus, we can use extraneously prime entries to represent traits that are the expression of alleles with a co-dominant relationship—which, expressed rigorously, is the relationship between two alleles in which the interaction of each combination of alleles expresses a different trait. Additionally, if n contains enough prime divisors, we can also construct multiple hierarchies in which the members of each hierarchy are extraneously prime to the members of every other hierarchy. For example, if a population contains the set of entries {106, 36, 175, 85} from Ж210 at a given position, then we have the dominance hierarchies 36 106 and 175 85 and the members of each hierarchy are extraneously prime to the members of the other hierarchy. As a concrete example of this, we can represent the alleles that determine ABO blood type with the hierarchies 15 25 and 10 25 from Ж30 . The gene that determines ABO blood type contains three alleles often denoted IA , IB , and i and the interaction of IA Please cite this article as: B.E. Bahr, A formulation of the foundations of genetics and evolution, Mathematical Biosciences (2016), http://dx.doi.org/10.1016/j.mbs.2016.02.005 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 ARTICLE IN PRESS JID: MBS [m5G;March 8, 2016;22:1] B.E. Bahr / Mathematical Biosciences xxx (2016) xxx–xxx 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 with IA produces the same trait (the A antigen) as the interaction of IA with i; likewise, the interaction of IB with IB produces the same trait (the B antigen) as the interaction of IB with i; however, the interaction of IA with IB produces both the A and the B antigens on the blood cell; and when i interacts with i, no antigens are produced [1]. Accordingly, if 15, 10, 25 represent the alleles IA , IB , and i respectively, then we can see that, because 15 × 25 ≡ 15 (mod 30) the lone A antigen trait can be represented by 15; because 10 × 25 ≡ 10, the lone B antigen trait can be represented by 10; and because 15 × 10 ≡ 0, the A and B antigen trait can be represented by 0; while the absence of an antigen can be represented by 25. The fact that elements of Жn can represent these dominance and co-dominance relationships is the main reason for choosing to use elements of Жn , but another reason is so that we can represent complete dominance, where the expression of an allele by itself produces the same trait as the expression of that allele interacting with another copy of itself. For example, an allele on the X chromosome that produces the same trait when it interacts with the same allele on the other X chromosome in females as it does when it does not interact with any other alleles in males (who lack a second X chromosome). Clearly, any element of Жn can represent such alleles since they are all idempotent elements. On the other hand, the elements of Zn – Жn can be used to represent alleles in which the expression of that allele by itself produces a different trait than the expression of that allele interacting with another copy of itself, since these elements are by deﬁnition not multiplicatively idempotent elements. Consequently, there are elements of Zn – Жn that can interact with the elements of Жn to represent a haploinsuﬃcient relationship between alleles, where the expression of A alone produces the same trait as the expression of A with but is different from the expression of A with A and the expression of with (which are both different). The Divisor Lemma (proved in the Appendix), shows that for all u Жn , if gcd(u,n) = δ , then δ u ≡ δ (mod n). So for u = δδ , we can represent this relationship between alleles using u and δ . For example, in Z30 , 2 × 16 ≡ 2 (mod 30), but 2 × 2 ≡ 4 and 16 × 16 ≡ 16. In Section 6.3, we will look at some possible relationships between alleles that the elements of Zn – Жn can also be used to represent. Let us now extend the expression function beyond the Principal of Segregation to include organisms whose genotype must be represented by multiple pairs of genotype matrices. For this case, let us investigate the following multiplicative phenotype expression function: h E g1 , . . . , g 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 = gh = p Suppose we have a population in which organisms contain the entries {a, d} at ge 1 and ge 2 and the entries {b, c} at ge 3 and ge 4 , where a b c d (and all other genotype matrices, if any, contain a 1 at this position). Since a dominates every other entry, pe will equal a for any combination containing a; since b dominates c and d, pe will equal b for any combination containing b but not a; and since c dominates d, pe will equal c for any combination that doesn’t contain a and b. So when every distinct combination of these entries is calculated, this function will produce the entries pe = a, pe = b, and pe = c in the ratio 12:3:1. This can replicate the relationship found in dominant epistasis where for four true alleles A, B, , and , the combination of A and A expresses the same trait as the combination of A with any other allele; the combination of and with B and B expresses a different trait than the ﬁrst but one that is the same as the trait expressed by the combination of and with B and ; and the combination of and with and produces a trait different from the other two traits. 5 And it’s not too diﬃcult to see that if organisms contain the entries {a, d} at both ge 1 , ge 2 and ge 3 , ge 4 , this function will produce the entries pe = a, pe = d in the ratio 15:1 (that of complimentary epistasis) since pe = d if and only if all four entries are d. (We will investigate representing other types of epistasis in Section 6.) So we can see that representing alleles with elements of Жn extends Mendel’s Law of Dominance to include co-dominance and also allows for the representation of genes with more than two variations that interact in various combinations of dominance and co-dominance. Additionally, certain elements of Zn – Жn paired with elements of Жn can be used to represent haploinsuﬃciency. 2.3. Additive phenotype expression Additive phenotype expression will be used to represent the expression of alleles that give rise to traits that differ in some measurable way (like height, litter size, etc.). Speciﬁcally, it will be used to represent genes whose variations differ only in the quantity of some substance that the gene contributes to the characteristic (like pigment, growth hormone, etc.) such that a difference in the quantity contributed by a gene will cause there to be a measurable difference in the characteristic. We will use the following function to represent this type of phenotype expression: 1 h E g ,...,g = ge h – 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 g =p 458 459 ge h Now if all ge 1 , … , ge h and ge 1 ’, … , ge h ’ are equal except ge i and ge where ge i – ge i ’ = , then ge h – ge h ’ = ge h – ge h + . Consequently, pe – pe ’ = and a difference in the value of two genotype entries will cause an equal difference in the phenotype entries they express. Since this difference is equal, genotype entries must be chosen according to the affect the alleles they represent have on the trait they express. For instance, if in a certain population two interactive pairs each contribute 0, 1, or 2 doses of some substance that inﬂuences a plant’s height, but each dose from one pair causes the plant to grow twice as tall as the doses from the other pair, then the set of entries representing each pair cannot be equal even though the number of doses is equal. However, we can see that there is some latitude in the entries we choose, since their differences are invariant when the same number is added to every entry at that position. So we could choose the two sets to be {0,1,2} and {0,2,4}, but if the phenotype entry they express represents a plant’s height that varies between 10 and 16 cm it might be more favorable to use {5,6,7} and {5,7,9} instead. In this way, then, this action can represent the expression of alleles that contribute to traits in a measurable way. Now, certain traits (like height) are usually described as varying “continuously” due to environmental inﬂuences (a better definition, that avoids any pitfalls of limitlessness, would be to say that these are traits in which the more precisely one measures, the more varieties there are to be found). In other words, traits that differ in measurable ways can sometimes be found to have values different from the values produced strictly by the expression of an organism’s genotype when these traits are inﬂuenced by the environment. Since phenotype matrices can only contain integers, we need a way for the alteration action (to be introduced in Section 4) to change an entry to a value different from the values produced by the phenotype expression function. This can be accomplished by multiplying each genotype entry at a certain position (for every organism in the population) by a constant of measure, κ . So, instead i’ Please cite this article as: B.E. Bahr, A formulation of the foundations of genetics and evolution, Mathematical Biosciences (2016), http://dx.doi.org/10.1016/j.mbs.2016.02.005 437 438 h So the difference between two traits pe and pe ’ arising from different genotypes is: pe –pe = 436 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 ARTICLE IN PRESS JID: MBS 6 B.E. Bahr / Mathematical Biosciences xxx (2016) xxx–xxx 510 of containing {a, b, … , z}, the set of genotype entries in the population would be {κ a, κ b, … , κ z}. This also means that the values that the phenotype entries represent must be scaled accordingly in different units since differences are not invariant under multiplication. For instance, if in the above example we use κ = 10, then each entry in the phenotype matrix should represent the plant’s height in millimeters instead of centimeters because the entries in the phenotype matrix would vary between 100 and 160. And phenotype expression will still only produce entries of 100, 110, 120, 130, 140, 150, and 160 mm; but, with the alteration action, these entries can also be changed to assume values like 101, 115, 137, etc. Thus the constant of measure should be selected with a large enough value to encompass the number of varieties that can be found with the given precision of measurement. 511 2.4. The general phenotype expression function 512 We can either require that all multiplicative genotype entries be in separate matrices from additive genotype entries, or we can combine them in the same matrices and combine the two types of phenotype expression into the following function: 496 497 498 499 500 501 502 503 504 505 506 507 508 509 513 514 515 E g1 , . . . , gh = 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 gh μ + gh α = p with the constraint that μ + α = I (where I is the identity matrix and μ and α are binary (containing only 1 s and 0 s) diagonal matrices). Since this constraint means that everywhere μ has a 1, α has a 0 and contrariwise (on the diagonal), all entries in gi are acted on by one (and only one) of the two types of phenotype expression, because if α e = 0, then the equation will be h ge + 0 = pe and, similarly, if μe = 0, then the equation will be 0 + ge h = pe . Now, it has been mentioned a few times that certain positions must be ﬁlled with identity elements if we want to limit the number of matrices required to represent an organism’s genotype. It is clear now that, for positions in which the entries interact ac h cording to g , the appropriate identity element to use is 1, the multiplicative identity element of Zn ; and, for positions in which the entries interact according to gh , the appropriate identity element to use is 0, the additive identity element of Zn . It is also easy enough to see that the appropriate identity element for any ge i is equal to μe . To demonstrate this combined action, let us suppose that an organism enters a phenotype expression gate containing the above function and the following two matrices: μ = 1 , 1 , 0 , 0 , 0 , 1 , 0 α = 0 , 0 , 1 , 1 , 1 , 0 , 1 537 538 And the organism will be a set of the following genotype matrices: g1 = 85, 106, 10, 10, 10, 15, 6 g = 85, 175, 10, 10, 30, 21, 12 2 g3 = 105, 1, 60, 20, 20, 141, 0 g4 = 175, 1, 30, 10, 40, 1, 0 539 540 Then, the following phenotype matrix will be produced in this gate: p = 0, 70, 110, 50, 100, 105, 18 541 3. Reproduction 542 As stated in the introduction, the action of having an organism enter a reproduction gate will be used to simulate reproduction. Now, since sexual reproduction in diploid organisms is the creation 543 544 [m5G;March 8, 2016;22:1] Fig. 4. Sexual reproduction. of a new organism through the fusion of gametes which contain a copy of one allele from each genetic locus in their progenitors’ genotypes, this requires an action that creates genotype matrices for a new (temporary) organism that contain a copy of one entry from each position for each pair of a progenitor organism’s germline genotype matrices and then these genotype matrices must be combined with those of another (temporary) organism that was created in the same way. And since asexual reproduction is the creation of a new organism which contains copies of the alleles from one progenitor, we need an action that will copy entries from a progenitor organism’s genotype matrices in a similar way, but will not combine with those of another organism. This action will involve reproduction gates which contain a reproduction function and recombination matrices that act on an organism’s genotype matrices to produce new matrices. Recombination matrices will be used to determine what entries are copied from the progenitor organism’s genotype matrices (and also whether there is genetic recombination) to the new matrices. And these newly produced matrices will leave the gate on a different path from the matrices of the organism that entered the gate and proceed to a phenotype expression gate (although they can also encounter alteration gates in between) where they become the genotype matrices for a new organism. Furthermore, for representing reproduction involving two progenitors, each organism is acted on in their own reproduction gate to create matrices that then proceed to the same phenotype expression gate (Fig. 4). In this case, even though the new matrices came from different gates, they will by deﬁnition become one organism, since they will simultaneously enter the same gate. And they will therefore all be used to express the phenotype of this new organism. So, in this formulation, when genotype matrices created from multiple organisms enter the same gate (and thus combine into one organism) we will use this action to represent the fusion of gametes and when matrices from only one organism proceed to a gate that expresses its phenotype we will use this action to represent asexual reproduction. Let us ﬁrst consider the creation of a gamete by a diploid organism whose genotype contains only interactive pairs; in other words, a genotype consisting solely of Mendelian traits. Such a genotype can consequently be represented by two genotype matrices. Now, according to Mendel’s Law of Segregation, an organism will contribute one of the alleles from each locus to the gamete, so an operation on the organism’s genotype matrices representing this should result in one matrix that contains one of the two entries from the organism’s genotype matrices for each position along the diagonal. Please cite this article as: B.E. Bahr, A formulation of the foundations of genetics and evolution, Mathematical Biosciences (2016), http://dx.doi.org/10.1016/j.mbs.2016.02.005 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 ARTICLE IN PRESS JID: MBS [m5G;March 8, 2016;22:1] B.E. Bahr / Mathematical Biosciences xxx (2016) xxx–xxx 593 594 595 596 If, for the two genotype matrices, we construct two binary diagonal recombination matrices r1 and r2 and deﬁne the relation between the two recombination matrices as r1 + r2 = I, we can use the following function to accomplish this: 1 2 R g ,g 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 1 1 ∗ 2 2 =g r +g r =g The requirement that r1 + r2 = I, which we will denote recombination entry symmetry, means that everywhere r1 has a 1, r2 has a 0 and contrariwise (on the diagonal). It follows then that the sum g1 r1 + g2 r2 will equal a matrix in which the entry at the jth locus is either gj 1 or gj 2 because, in the expression gj 1 rj 1 + gj 2 rj 2 either gj 1 rj 1 = gj 1 and gj 2 rj 2 = 0 or gj 1 rj 1 = 0 and gj 2 rj 2 = gj 2 . Therefore, this operation will copy the entry from one or the other of the genotype matrices into a single matrix for each position along the diagonal for this case. Thus, the reproduction action with recombination entry symmetry acting on two genotype matrices can mathematically express Mendel’s Law of Segregation acting on Mendelian traits. Let us now deal with the more general case of an organism whose genotype must be represented by multiple pairs of genotype matrices. Now, since, for interactive pairs, each matrix pair represents the alleles from a single genetic locus and we want a copy of one allele from each genetic locus, the operation on each pair of genotype matrices should still result in one matrix that contains one of the two entries from the pair for each position along the diagonal; however, since single alleles can be used to represent alleles on sex chromosomes, we may need extra constraints placed on the entries in recombination matrices since sex chromosomes often exhibit genetic linkage in one gender. In general, then, we can use recombination matrices that exhibit recombination entry symmetry in pairs, rη + rη+ 1 = I (where η is all odd values of the sequence 1, 2, … , h), and the general reproduction function: 1 1 1 Rη (gη , gη+ ) = gη rη + gη+ r η+ = gη∗ 624 625 626 627 628 629 630 631 632 633 634 635 636 By the same logic as before, it follows that the sum gη rη + gη+ 1 rη+ 1 will equal a matrix in which the entry at the jth locus is either g η or g η+ 1 . Therefore, for each pair of genotype j j matrices, this operation will copy the entry from one of the matrices for each position along the diagonal and will result in a single genotype matrix for each pair of genotype matrices for this case. (And clearly, the ﬁrst case was a special case of this one with η = 1.) However, we will need to put extra constraints on the recombination matrices in order for them to represent genetic linkage. For example, suppose we had the following genotype matrices (where entries denoted xi are entries from an X chromosome and entries denoted yj are entries from a Y chromosome): g1 = g1 1 , g2 1 , g3 1 , . . . , μa , . . . . . . , μb , y1 1 , . . . , yc 1 g2 = g1 2 , g2 2 , g3 2 , . . . , x1 2 , . . . . . . , xb 2 , μb+ 1 , . . . , μc 637 638 639 640 641 where x1 begins at ga and ends at gb and y1 begins at gb+ 1 and ends at gc . If we constrain the entries at these positions in the recombination matrices to be equal, ra i = ra+ 1 i = ra+ 2 i = … = rc i , it is evident that the reproduction function will create a g∗ matrix containing either: . . . , μa , . . . , μb , . . . , y1 1 , . . . , yc 1 or . . . , x1 2 , . . . , xb 2 , . . . , × μb+ 1 , . . . , μc 642 643 644 645 646 647 And if an allele from one of the sex chromosomes interacts with an interactive pair or a single allele (and is therefore in a different matrix than the other alleles from its chromosome), we need to be sure to constrain the appropriate entry in the appropriate recombination matrix. In this way, then, we can represent the transference of all alleles that are transferred with no genetic recombination from one or the other of the two sex chromosomes to a gamete in organisms that contain different types of sex chromosomes. From this we can see that whether or not this action represents the occurrence of genetic recombination depends on the relation between each entry in the recombination matrices (and on whether alleles from the same chromosome are represented by entries in the same genotype matrices). Because, for any ra i and rb i , ra i + rb i can equal 0, 1, or 2 (since every re i is either 1 or 0), so if alleles that came from the same chromosome are represented by entries in the same genotype matrix, values of 0 and 2 represent the occurrence of no recombination while a value of 1 represents the occurrence of genetic recombination. We can also see from this that the reproduction function under recombination entry symmetry for each pair of matrices can represent the production of an organism that contains a copy of one or the other of the alleles from each genetic locus of its progenitors. Yet, let us consider what happens in the simple case of two organisms with two genotype matrices each giving an exact copy of one of their matrices to form another organism. If their alleles were not systematically assigned to speciﬁc positions in their genotype matrices, then the requirement for interactive alleles might be violated when the new organism enters a gate that expresses its phenotype. Thus we need a species requirement: if two organisms in this formulation represent true organisms of the same species, then each position in their genotype matrices must contain an entry representing an allele from the same gene. Consequently, since the reproduction action copies entries from genotype matrices into the same position in a new matrix, new organisms constructed from organisms complying with the species requirement will not violate the interactive allele requirement. And, to comply with these requirements, when representing the fusion of gametes, for each value of η, the gη ∗ that were contributed from one of the paths should become gη and the gη ∗ that were contributed from the other path should become gη+ 1 when they enter the subsequent phenotype expression gate. Another aspect of the species requirement should be that if two organisms in this formulation represent true organisms of the same species, then each position in their genotype matrices must also contain an entry from Zn for the same n. (Because adding or multiplying numbers from Zn and Zm is undeﬁned if n does not equal m.) In this way, then, the reproduction action can represent the creation of a new organism through the fusion of gametes which contain a copy of one allele from each genetic locus in their progenitors’ genotypes. Additionally, this formulation allows us to study the effects of overlapping generations since reproduction gates only need to occur simultaneously for organisms that are mating together. Thus, the organisms of each generation can enter reproduction gates at different times from each other. For asexual reproduction, we need to use a slightly modiﬁed reproduction function: h R g h 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 h∗ =g I=g Clearly this function just makes copies of each genotype matrix and then, if the path the newly created g∗ matrices follow is the only path leading to a subsequent expression gate, this can represent the creation of a new organism through asexual reproduction since it will represent the creation of an organism that inherits its alleles from a single progenitor. To demonstrate the effect of recombination matrices, let us have two identical organisms enter simultaneous reproduction gates containing different recombination matrices, using the same organism as before: g1 = 85, 106, 10, 10, 10, 15, 6 Please cite this article as: B.E. Bahr, A formulation of the foundations of genetics and evolution, Mathematical Biosciences (2016), http://dx.doi.org/10.1016/j.mbs.2016.02.005 7 700 701 702 703 704 705 706 707 708 709 ARTICLE IN PRESS JID: MBS 8 B.E. Bahr / Mathematical Biosciences xxx (2016) xxx–xxx g2 = 85, 175, 10, 10, 30, 21, 12 g = 105, 1, 60, 20, 20, 141, 0 3 g4 = 175, 1, 30, 10, 40, 1, 0 For one organism, we will use the recombination matrices: 710 r 1 = 0 , 1 , 0 , 0 , 0 , 1 , 0 r 2 = 1 , 0 , 1 , 1 , 1 , 0 , 1 r 3 = 1 , 0 , 1 , 0 , 1 , 0 , 1 So performing g1 r1 + g2 r2 = g1 ∗ and g3 r3 + g4 r4 = g3 ∗ results 711 in: g1∗ = 85, 106, 10, 10, 30, 15, 12 g3∗ = 105, 1, 60, 10, 20, 1, 0 And for the other organism, we will use: 713 r 1 = 0 , 1 , 1 , 0 , 0 , 0 , 1 r 2 = 1 , 0 , 0 , 1 , 1 , 1 , 0 r 3 = 0 , 0 , 0 , 1 , 0 , 1 , 1 r 4 = 1 , 1 , 1 , 0 , 1 , 0 , 0 So performing 714 715 g1 r1 = g2 ∗ and g3 r3 + g4 r4 = g4 ∗ results in: g4∗ = 175, 1, 30, 20, 40, 141, 0 717 Thus, the new organism will have the following phenotype matrix: p = 105, 106, 110, 50, 120, 105, 18 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 In the alteration action, a new matrix is created that will replace the matrix it was produced from; yet the reason for this replacement is so that we can change or exchange particular entries in the organism’s matrices, so we need a method of copying entries similar to that of the last section to keep all but the entries being altered constant. To represent the various different kinds of mutation will require two types of alteration matrices: the ﬁrst we will refer to as alteration matrices and the second as translocation matrices. We will ﬁrst investigate alteration matrices. 4.1. Changes in variation and deletion + g2 r2 g2∗ = 85, 106, 10, 10, 30, 21, 6 716 have different causes, we will specify two different kinds of genetic drift: gametic genetic drift, where a break in recombination interval symmetry causes a change in allele frequency; and reductive genetic drift, where the parents of the next generation are a subset of the children of the previous generation and the new parents’ allele frequencies are not a uniform scaling of that of the children of the previous generation. The example in Section 6 will also demonstrate these two kinds of genetic drift. 4. Environmental alteration r 4 = 0 , 1 , 0 , 1 , 0 , 1 , 0 712 [m5G;March 8, 2016;22:1] First of all, we want an action that uses alteration matrices that do not have to depend on the entries in the matrix they are altering, because if the entries in alteration matrices depend on the entries in the matrix they are altering, then this action will have no predictive power and we would essentially just be manually replacing the matrices. Therefore we will deﬁne a pair of alteration matrices, a and ã, that exist in alteration gates and act in what may seem like an overly complicated expression at ﬁrst: Which contains several entries that differ, due to recombination, from the identical phenotype matrices of the parents: A ( q ) = qa + ∼ a = q p = 0, 70, 110, 50, 100, 105, 18 (where q is either a phenotype matrix p, a genotype matrix gi , or a gi∗ matrix). It follows that the change in any particular entry is qa ’ – qa = ãa – qa (1 – aa ), so whenever ãa = qa (1 – aa ) there will be a change in the value of qa . Evidently, in order to keep all but the entries being altered constant, a must contain all 1’s and ã must contain all 0’s except at the positions of entries being altered. The main reason for choosing the above function is because Жn is not always a ring, so if we want the alteration function to only produce values that are elements of Жn (and have entries that do not depend on the entries in the matrix they are altering) then we need a more complicated expression than just q’ = q + a or q’ = qa. The Alteration Proposition (proved in the Appendix) shows that qa ’ will be an element of Жn as long as aa + ãa Жn and ãa Жn , so the environment space can contain any a and ã that satisfy these conditions when we want the alteration function to only produce values that are elements of Жn . For the case of additive alleles, though, since Zn is closed under modular addition and multiplication, qa aa + ãa will be an element of Zn for any aa and ãa . As stated in the introduction, the matrix represented by q will enter the gate and act in the above function to produce q’ but will not leave the gate; instead q’ will replace q in the organism’s set of matrices. Accordingly, to represent factors that lead to a change of variation in alleles or traits, alteration matrices will be used in this way to change entries in a genotype or phenotype matrix while keeping the other entries constant; speciﬁcally, each entry in an alteration matrix that is not the identity will represent an environmental effect that causes a mutation of a particular allele or trait. To explore the effect of this action, let us ﬁrst suppose that, in a given population, the set of all entries at qa is a subset of Zn . If v is not in the set of entries at qa but u is, then this action on the Consider now a space which contains a reproduction gate with the recombination matrices r1 , … , rh followed by another reproduction gate at some time later that contains the recombination matrices r’1 , … , r’h where each pair of recombination matrices is under recombination entry symmetry. For a given position e, if re i + r’e i = 1, it follows that when ge i ∗ contains ge i , then g’e i ∗ contains ge i ±1 and contrariwise. We can see that this is the case, because for each pair of recombination matrices under these two symmetries, rη = r’η+ 1 and rη+ 1 = r’η (the e subscript is suppressed for readability), since rη = 1 – rη+ 1 (due to recombination entry symmetry) and rη = 1 – r’η (the assumption from above) so 1 – rη+ 1 = 1 – r’η and hence rη+ 1 = r’η . Making the opposite substitution shows that rη = r’η+ 1 . (Hence if re η + r’e η = 1, then re η + 1 + r’e η + 1 = 1 and contrariwise.) We will denote re i + r’e i = 1 as recombination interval symmetry for the position e. So, in a space that has two reproduction gates containing a pair of recombination matrices under both symmetries for a certain position, any organism that proceeds through these reproduction gates will contribute each of the entries from the corresponding pair of genotype matrices to a g∗ matrix because the entry that is not contributed in the ﬁrst breeding gate is contributed in the second and contrariwise. Therefore, crossover symmetry in the reproduction action conserves allele frequencies between organisms and the g∗ matrices they produce. When symmetry is broken, genetic drift is possible because the g∗ matrices each organism produces will contain a random sample of their alleles. We will see an example involving both recombination interval symmetry and genetic drift in Section 6.1. But, because they Please cite this article as: B.E. Bahr, A formulation of the foundations of genetics and evolution, Mathematical Biosciences (2016), http://dx.doi.org/10.1016/j.mbs.2016.02.005 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 ARTICLE IN PRESS JID: MBS [m5G;March 8, 2016;22:1] B.E. Bahr / Mathematical Biosciences xxx (2016) xxx–xxx 809 810 811 812 813 entry u with ãa = u – v(1 – aa ) will change u to v. Clearly then the alteration action with certain values of aa and ãa can produce qa ’ that were not in the original set of entries. For example, suppose we have a population of 4 organisms with the following genotypes: g1 = 15, 25, 21 25, 25, 15 15, 25, 15 25, 25, 21 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 So the set of all entries for the ﬁrst position in this population is {15, 25}. If we choose the alteration matrices a = 5, 1, 1 and ã = 10, 0, 0, this will simply ﬂip each 15 to 25 and each 25 to 15, which might change the allele frequencies, but will leave the set of all entries unchanged. However, if any of the organisms are acted on by a = 15, 1, 1 and ã = 1, 0, 0, a new entry, 16, will be introduced to the set of all entries for the ﬁrst position in this population making it {15, 25, 16}. Another way this action can be used is to have aa = 0 and ãa = μa (where μa is the identity element for that position) operate on ga i or ga ∗ . This can represent a mutation that results in the deletion or malfunction of a gene since that entry will no longer inﬂuence the phenotype entry. So we can see that the alteration action on a genotype matrix can represent the mutation of a gene into a different variation or the deletion of a gene. And if the alteration action operates on a germ-line genotype matrix or on a g∗ matrix itself, this can represent the introduction of a new genetic variant in a population of true organisms (we will see an example of this in Section 6.1). Furthermore, the alteration action on a phenotype matrix can represent the alteration of a trait to one that is not the expression of its genotype; and it can also represent the alteration of a trait to one that is not the expression of the genotype of any organism in the population for certain values of a and ã. For example, if we recall the example involving additive alleles that have been multiplied by a constant of measure, resulting in entries in the phenotype matrix that vary between 100 and 160, we can see that using aa = 1 and ãa equal to any number between 0 and 10 will result in an entry that is not the expression of the genotype of any organism in the population. Clearly, then, the alteration action can cause this phenotype entry to assume any integer value from 100 to 160. 846 847 848 849 850 851 852 853 854 855 856 857 0 0 0 0 1 0 0 0 1 a 0 0 0 b 0 0 0 0 + 1 c 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 1 d 0 0 0 e 0 0 0 0 + 1 f 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 In this case, a genotype matrix will again be acted on and then replaced by the matrix produced from this action, but we will instead use translocation matrices, t, which can be any permutation matrix (a square binary matrix where each row and column contain all 0 s except for one entry) that is symmetric. For example: g2 = 25, 21, 15 15, 25, 21 15, 25, 21 25, 25, 15 0 0 1 1 0 0 0 0 0 1 or 0 1 0 1 0 0 ⎤ 0 0 0 1 ⎡ 0 ⎢0 or ⎣ 0 1 0 1 0 0 0 0 1 0 ⎤ 860 861 862 863 0 0⎥ 1⎥ ⎦ 0 864 A(g ) = tgt = g . Since t is symmetric, tij = tji , so for any tai and tib in which a = b, tai gii tib = 0 because one of tai and tib equals 0; likewise for any tai and tja in which i = j, tai gij tja = 0 because one of tai and tja equals 0. (And gij also equals zero whenever i = j.) This means that a b tia gab tbj = 0 whenever a = b and i = j, so the only non-zero entries are: a a tia gaa tai . It follows then that, for each tia = tai = 1, the entry at gaa will be transferred to g’ii (and if i = a there is no movement). This action can therefore translocate entries in a single matrix. For example, using the ﬁrst two matrices from above and g = a,b,c we can see that g’ will equal b,a,c and c,b,a; and using the second two matrices from above and g = a,b,c,d we can see that g’ will equal d,b,c,a and b,a,d,c. Now, to translocate entries from different genotype matrices, the action is more complicated. We need to use the following expressions to exchange entries between gi and gj : i A g j i = g + t g t = g andA g 1 i 2 j = g + t g t = g 2 j 1 i where i and ϶i are binary diagonal matrices and 1 + ϶1 = 2 + ϶2 = 1 + t϶2 t = 2 + t϶1 t = I. It is perhaps more instructive to begin with an example in this case. In the following, a and e exchange places in the matrices gi = a,b,c and gj = d,e,f: a 0 0 0 b 0 0 0 c 0 1 0 1 0 0 0 0 1 0 1 0 1 0 0 0 0 1 = = e 0 0 0 b 0 0 0 c d 0 0 0 a 0 0 0 f 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 j 881 882 883 884 885 It follows that multiplying 1 gi results in a matrix whose diagonal either contains ge i or 0 and that multiplying 2 gj results in a matrix whose diagonal either contains ge j or 0 since 1 , ϶2 are just identity matrices with a few 1 s replaced with 0 s; and we know from above that t϶2 gj t will simply move entries around on the diagonal, so the products 1 gi and t϶2 gj t both produce diagonal matrices that contain either ge i , ge j , or 0 along their diagonal. Furthermore, since 1 + t϶2 t = I, this means that whenever [1 gi ]e = ge i , [t϶2 gj t]e will equal 0 and whenever [1 gi ]e = 0, [t϶2 gj t]e will equal ge j , so for each position where 1 contains a 1, the diagonal will contain the entry from gi and for each position where 1 contains a 0 (on the diagonal) it will contain the entry from gj . The same reasoning applies to 2 gj + t϶1 gi t = gj ’. These functions can therefore translocate entries in or between an organism’s genotype matrices and, when the genotype matrices Please cite this article as: B.E. Bahr, A formulation of the foundations of genetics and evolution, Mathematical Biosciences (2016), http://dx.doi.org/10.1016/j.mbs.2016.02.005 858 859 1 0⎥ or 0⎦ 0 And the structure alteration function will be: 0 0 f Besides changing particular alleles or traits to other variants, mutations can also change the structure of chromosomes by duplicating sections, inverting sections, translocating sections, etc. However, since the position of each entry in a genotype matrix is determined by the interactive allele requirement and the species requirement and does not necessarily correspond to the alleles’ position in a chromosome, our concern is with how these types of mutation alter the combinations of interactive alleles in a certain organism and/or alter the position of alleles in such a way that the combinations of interactive alleles in an organism’s offspring will be altered. 1 0 0 ⎢1 × ⎢0 ⎣ 0 0 e 0 4.2. Translocation and duplication 0 1 0 ⎡0 d 0 0 9 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 ARTICLE IN PRESS JID: MBS 10 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 B.E. Bahr / Mathematical Biosciences xxx (2016) xxx–xxx being acted on are replaced by the newly constructed matrices, we can use this action to represent phenomena in which pairs of alleles in a true organism’s genotype are affected in such a way that they interact with new alleles and/or phenomena in which pairs of genes are affected in such a way that that they are no longer located at the same locus as they are in the rest of the organisms of that species. If we remember back to Section 1.2, it was suggested that “extra” pairs of genotype matrices containing only identity elements be included in the organism’s set of matrices and that “extra” identity elements be included in each genotype matrix so that we can add entries without having to change the size of any matrices. We can use these “empty” positions to duplicate entries and insert them in other matrices by using the above function to form only one new matrix. In other words, only the gi matrix will be replaced and entries from gj will be copied into gi ’ in the expression: i A g i = g + t g t = g i j 928 For example, if g7 is one of the “extra” genotype matrices and t is equal to the identity matrix, then we can copy entries from g4 into g7 with the above expression and they will interact with the entries from g4 they were copied from (because there is no translocation since t = I). Or, if t = I, the entries from g4 can be copied and moved to a different position where they will interact with entries other than the ones they were copied from (or only identity elements). Thus, this action can be used to represent mutations in which genes are duplicated. 929 5. Natural selection 919 920 921 922 923 924 925 926 927 930 931 932 933 934 935 To represent natural selection, we need a gate that will dictate whether an organism proceeds to a reproduction gate based on the value of a certain entry in its phenotype matrix. So each natural selection gate will be followed by a reproduction gate (or another natural selection gate if more than one trait is under selective pressure) and will contain the following natural selection function: S ( p) = pk sk k 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 [m5G;March 8, 2016;22:1] where the selecting matrix s is a binary matrix containing a single non-zero entry (located on the main diagonal). We will denote the position of that single non-zero entry as ss , so in other words, although we are calculating the trace of the product ps, since ps ss will be the only non-zero entry in the product, the output of this function is simply k pk sk = ps . Each gate also needs to contain a set of selection constants, {σ 1 , … , σ t }, and if ps = σ i for any i, then that organism is allowed to pass through the gate. Thus the value of ps and the values of the selection constants determine whether or not the organism proceeds to a reproduction gate. This action can therefore represent circumstances in which an organism in a certain environment with a certain trait reproduces successfully whereas one with a different variant of that trait is unsuccessful in that environment. And, since the natural selection and reproduction actions operate on the genotype as a whole, this formulation automatically models linked selection when there is linkage in the entries of recombination matrices (we will see an example of linked selection in the Section 6.1). So, in a given population, if the ﬁtness of a trait is not 1, then there need to be natural selection gates in the environment space of that population; and if the ﬁtness of a trait is not 0, then there need to be organisms with the selected against trait that do not encounter natural selection gates. In other words, the frequency of natural selection gates in the environment space of a population correlates with the ﬁtness of a given trait, but is not necessarily an exact ratio since the number of organisms with and without the selected against trait might not encounter natural selection gates equally. 964 6. Applications 965 This formulation can be used to mathematically describe and simulate biological phenomena and model changes in a population’s genetic composition over time (the population genetics of a small population will be simulated in the next section). Mendelian inheritance has a simple formulation: g1 g2 = p formulates the Law of Dominance, where each entry in g1 , g2 is an element of Z2 ; the breeding function g1 r1 + g2 r2 = g∗ under crossover symmetry r1 + r2 = I, formulates the Principle of Segregation; and the Principle of Independent Assortment can be formulated with a requirement that half the spaces in a system have ra 1 + rb 2 = 1 for every a and b. Additionally, we have seen that modiﬁcations to Mendel’s principles, like dominance hierarchies, co-dominance, epistasis, alleles with additive effects, etc. can also be represented by a generalization of the formulas that describe Mendelian inheritance. A major advantage becomes apparent when this formulation is used to model organisms under selective pressure because it can model the effect of natural selection on the whole genotype. Natural selection not only affects the allele frequency of the genes that give rise to the selected trait (and the genes linked to those genes), but can also cause reductive genetic drift in the allele frequency of all other alleles in the organism’s genotype due to the sampling error that might arise from the reduction in the breeding population. Because, for all genes besides the ones that give rise to the characteristic involved in the selection, the organisms that are prevented from breeding contain a random sample of the alleles from the original population. And since the actions of this formulation can operate individually on each organism’s genotype as a whole, this formulation automatically simulates this phenomenon (we will also see an example of this type of genetic drift in the next section). This advantage also extends to studying the effects of having more than one characteristic be under selective pressure, because selective pressure on each characteristic can cause reductive genetic drift in the frequencies of the alleles that give rise to the other characteristics involved in selection. Yet another advantage is that the frequency of natural selection gates can be varied in each generation, so that the simulated population will be under selective pressure in which the ﬁtness values change over time. 966 6.1. Population genetics We will now work a full example involving each type of gate (as shown in Fig 5; time moves downward in the direction of the arrows). This example will demonstrate a simulation involving gametic genetic drift, linked selection, and reductive genetic drift arising from natural selection. (Because the population is so small, this will be an extreme example of these phenomena.) The following germ-line genotype matrices with entries from Z210 will be used, where each “male” will contain one entry that represents an entry from an X chromosome and one entry that represents an entry from a Y chromosome, while “females” will contain two entries that represent entries from an X chromosome (all 1 s are identity elements that do not represent an allele): Please cite this article as: B.E. Bahr, A formulation of the foundations of genetics and evolution, Mathematical Biosciences (2016), http://dx.doi.org/10.1016/j.mbs.2016.02.005 959 960 961 962 963 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 10 0 0 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 ARTICLE IN PRESS JID: MBS B.E. Bahr / Mathematical Biosciences xxx (2016) xxx–xxx [m5G;March 8, 2016;22:1] 11 Fig. 5. Environment space for this population. Male 1 g1 = 175, 106, 20, 105, 1 g2 = 175, 175, 20, 1, 105 Female 1 g1 = 85, 36, 30, 105, 1 g2 = 85, 175, 30, 105, 1 Male 2 85, 106, 30, 175, 1 175, 36, 20, 1, 105 Female 2 105, 106, 10, 175, 1 105, 85, 20, 105, 1 Male 3 85, 85, 10, 105, 1 105, 175, 10, 1, 105 Female 3 175, 85, 20, 105, 1 175, 85, 10, 175, 1 1019 1020 1021 So the set of entries for each position in the genotype matrices of this population is: g= 1022 1023 1024 1025 1026 {105, 175, 85}, {36, 106, 175, 85}, {10, 20, 30}, × {105, 175}, {105} The ﬁrst gate the organisms enter is a phenotype expression gate containing the general phenotype expression function and the matrices μ = 1, 1, 0, 1, 1 and α = 0, 0, 1, 0, 0. Thus phenotype expression operates in this way: g1 1 g1 2 , g2 1 g2 2 , g3 1 +g3 2 , g4 1 g4 2 , g5 1 g5 2 and results in the following phenotype matrices: Male 1 p = 175, 70, 40, 105, 105 Female 1 p = 85, 0, 60, 105, 1 Male 2 175, 36, 50, 105, 105 Female 2 105, 190, 30, 105, 1 After that, all organisms except Male 1 and Female 2 enter a natural selection gate containing s = 1, 0, 0, 0, 0 and σ = 105. Thus, only organisms with p1 = 105 will leave the gate and proceed to a reproduction gate. So we can see that Male 2 and Female 1 will not leave the natural selection gate, since for them p1 = 175 and 85 respectively. (Note that these happened to be the only organisms that contained g2 i = 36.) However, even though Male 1 has 1035 1036 1037 1038 1039 1040 1041 Male 3 105, 175, 20, 175, 105 Female 3 105, 85, 30, 105, 1 1027 1028 1029 So the set of entries for each position in the phenotype matrices of this population is: p= 1030 1031 1032 1033 1034 {105, 175, 85}, {36, 175, 85, 70, 0, 190}, × {10, 20, 30, 40, 50, 60}, {105, 175}, {105} According to Fig. 5, Male 3 next enters an alteration gate, containing the alteration function and the alteration matrices a = 1, 1, 1, 1, 105 and ã = 0, 0, 0, 0, 70 that operate on its germ-cell g2 . This will result in Male 3’s g2 being changed to 105, 175, 10, 1, 175 since 105 × 105 + 70 = 175. p1 = 175, it will still proceed to a reproduction gate since there was no natural selection gate in its path. And ﬁnally, the remaining organisms will proceed to reproduction gates containing the following recombination matrices, where we will have r1 i = r3 i for all recombination matrices in order to demonstrate linked selection (since p1 was the trait involved in the selection action); (note also that for the males it must be that r4 i = r5 i for the reasons stated in Section 2): Please cite this article as: B.E. Bahr, A formulation of the foundations of genetics and evolution, Mathematical Biosciences (2016), http://dx.doi.org/10.1016/j.mbs.2016.02.005 1042 1043 1044 1045 1046 1047 1048 1049 ARTICLE IN PRESS JID: MBS 12 B.E. Bahr / Mathematical Biosciences xxx (2016) xxx–xxx Male 1 r 1 = 0 , 1 , 0 , 1 , 1 r 2 = 1 , 0 , 1 , 0 , 0 1050 1051 1052 Male 3 1 , 1 , 1 , 0 , 0 0 , 0 , 0 , 1 , 1 1053 1055 1056 1057 1058 1060 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 g1 New Male 1 175, 85, 10, 105, 1 85, 85, 10, 1, 175 Then the original organisms will enter another reproduction gate in which (to save space) we will just use the same recombination matrices but switch them between the organisms of each gender (the recombination matrices previously used with Male 1 are now used with Male 3, etc.). And if we index them in the same manner, we have two more new organisms: New Female 2 175, 85, 10, 175, 1 105, 85, 10, 105, 1 And the set of entries for each position in the genotype matrices of this population is: g= 1061 Female 3 0 , 0 , 0 , 1 , 0 1 , 1 , 1 , 0 , 1 If we index each matrix created by a female as and each g∗ created by a male matrix as g2 when these enter the subsequent expression gate, then we have the following new organisms: New Male 2 g1 = 105, 85, 20, 175, 1 g2 = 175, 175, 20, 1, 105 1059 Female 2 0 , 1 , 0 , 0 , 0 1 , 0 , 1 , 1 , 1 g∗ New Female 1 g1 = 105, 106, 20, 105, 1 g2 = 175, 106, 20, 105, 1 1054 [m5G;March 8, 2016;22:1] {105, 175, 85}, {106, 175, 85}, {10, 20}, × {105, 175}, {105, 175} We can see that the allele frequencies of g1 = 105:175:85 changed from 0.25:0.42:0.33 in the original population to 0.375:0.5:0.125 due to the natural selection action. So the frequency of g1 = 105 increased as we would expect. However, there were also several other changes due to the presence of the natural selection gates. This population no longer contains g2 = 36 and g3 = 30 as possible entries even though natural selection acted on p1 . The disappearance of g2 = 36 is an example of reductive genetic drift due to natural selection since the two organisms that were selected against happened to be the only ones containing g2 = 36. And the disappearance of g3 = 30 is due to linked selection, because r1 i = r3 i and all of the matrices containing g3 = 30 were matrices containing g1 = 85, which increased that organism’s chance of having a value of p1 = 105. Note also that there was recombination interval symmetry in r4 + r’4 and, as expected, there was no gametic genetic drift in g4 ; there also happened to be no reductive genetic drift since there was no sampling error in the parental population for the allele frequency in g4 . So the original 2/3 frequency of g4 = 105 is found in the original population, in the parents of the new generation, and also in the new population. But because there was a sampling error in the parental population and because there was no recombination interval symmetry in r2 + r’2 , not only were no entries of 36 passed to the new population, but the frequency of the other entries of g2 changed twice. In the original population, the frequency ratio of 36:106:175:85 was 0.18:0.27:0.27:0.27, while in the parents it changed to 0:0.25:0.25:0.5 due to reductive genetic drift from natural selection and it was then changed to 0:0.25:0.125:0.625 due to gametic genetic drift. And we can see that this environment space ultimately produces new males and females that all have phenotypes that are different from the organisms in the original population and New Male 1 even has the new trait p5 = 175, which was introduced by mutation. New Female1 p = 105, 106, 40, 105, 1 New Male1 175, 85, 20, 105, 175 1096 New Male2 p = 105, 175, 40, 175, 105 New Female2 105, 85, 20, 105, 1 Clearly a computer program is required for larger populations with larger sequences of gates, but even this short example shows that this formulation provides a powerful method of studying natural selection since it acts on the genotype and phenotype as a whole. 6.2. Epistatic ratios The phenotypic ratios that are usually listed as arising due to epistatic relationships between multiple loci cannot all be pro h duced by the operation: g . They can however, be produced by uglier equations like: (ge 1 ge 2 + ε )ge 3 ge 4 (where ε equals either 1 or 0). Part of the reason this equation is unappealing is that we must require that ge 1 and ge 2 be either 0 or 1 (and it further complicates the phenotype expression function). For example, if we have a population in which organisms contain the entries {0, 1} at ge 1 , ge 2 and the entries{b, c} at ge 3 , ge 4 , where b c, then using ε = 0 will of course result in the ratio for dominant epistasis that we found earlier (because the equation is reduced to ge 1 ge 2 ge 3 ge 4 ). And, if ε = 1, then this equation will produce the entries pe = b, pe = c, and pe = 0 in the ratio 9:3:4. It is possible, though, that corrections need to be made to the h currently accepted phenotypic ratios, because the operation g can produce two of the ratios associated with epistasis (as we have already seen) and one that is similar to the currently accepted phenotypic ratio of recessive epistasis: if we have a population in which organisms contain the entries {a, b} at ge 1 , ge 2 and the entries{c, d} at ge 3 , ge 4 , where a b and c d and a, b are extraneously prime to c, d, then the phenotype expression function will produce the entries pe = ac, pe = bc, pe = ad, and pe = bd in the ratio 9:3:3:1. Now, there are two genes in chickens that each have two variations that interact to express 4 different comb shapes denoted walnut, rose, pea, and single and there are two genes in the pepper capsicum annuum that each have two variations that interact to express the colors red, brown, yellow, and green and these phenotype variations all arise in the same manner as the phenotype entries from above [2]. In contrast, recessive epistasis is described as producing a phenotype ratio of 9:3:4. Coat color in Labrador Retrievers is often cited as an example of recessive epistasis because there is a B locus and an E locus that each have two variations B/b and E/e with a dominant/recessive relationship and the alleles from these two loci interact to express three different coat colors: the combination B_E_ produces black, bbE_ produces brown, and __ee produces yellow [4]. Yet, the B alleles also affect skin color, making it either black (B_) or ﬂeshy brown (bb) [3]. So coat and skin color are not independent traits: there cannot be Labradors with black fur and brown skin nor Labradors with brown fur and black skin. When we take skin color into consideration, it is readily seen that there are four distinct traits that arise in the same manner as the 9:3:3:1 ratio from above: black fur with black skin (B_E_), brown fur with brown skin (bbE_), yellow fur with black skin (B_ee), and yellow fur with brown skin (bbee). This raises the question of whether phenotype ratios of 9:3:4 truly exist in nature or if some aspect of the trait produced by these alleles is not being taken into account when a ratio of 9:3:4 is found. h Likewise, the operation g cannot produce the currently accepted phenotypic ratios for dominant and recessive epistasis, duplicate recessive epistasis, nor duplicate interaction. However, the h operation g predicts several forms of epistasis besides the three already listed. For example, if we use the entries {a, b} at ge 1 , ge 2 and the entries{c, d} at ge 3 , ge 4 , where a c d and b is extraneously prime Please cite this article as: B.E. Bahr, A formulation of the foundations of genetics and evolution, Mathematical Biosciences (2016), http://dx.doi.org/10.1016/j.mbs.2016.02.005 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 ARTICLE IN PRESS JID: MBS [m5G;March 8, 2016;22:1] B.E. Bahr / Mathematical Biosciences xxx (2016) xxx–xxx 1160 1161 1162 1163 1164 to a, c, d, then the phenotype expression function will produce the entries pe = ab, pe = a, pe = bc, and pe = bd in the ratio 8:4:3:1. Further research into epistasis will determine whether changes need to be made to the currently accepted phenotypic ratios or to the phenotype expression function. Zn – Жn 1165 6.3. Using elements of 1166 1205 The elements of Жn can be used in the representation of relationships involving complete dominance, where the expression of each allele by itself produces the same trait as the expression of that allele interacting with another copy of itself, since these elements are by deﬁnition multiplicatively idempotent elements. Conversely, the elements of Zn – Жn can be used in the representation of alleles in which the expression of that allele by itself produces a different trait than the expression of that allele interacting with another copy of itself, since these elements are by deﬁnition not multiplicatively idempotent elements. It has already been shown that a haploinsuﬃcient relationship can be represented using certain elements of Zn – Жn paired with an element of Жn , but there are also other relationships that can be produced with different pairings. If we have u Жn and x Zn – Жn , then we can always have ux = u, since 0 is an element of Жn ; however, there are other possibilities for u: for example, 3 × 3 ≡ 3 × 5 ≡ 3 (mod 6), while 5 × 5 ≡ 1. Thus we can use 0 (or certain other elements of Жn ) paired with an element of Zn – Жn to represent a relationship where the expression of A alone produces the same trait as the expression of A with A and A with , but differs from the expression of with and of by itself (which are both different from each other). So there are x Zn – Жn in which ux = u and ux = x (haploinsuﬃciency) and there is also the possibility that ux = z (where z = u = x). For example, 7 × 7 ≡ 19 (mod 30) and 7 × 16 ≡ 22 and 16 × 16 ≡ 16. We can use numbers like these to represent a relationship where the expression of A alone produces the same trait as the expression of A with A, but differs from the expression of A with B, B with B, and B by itself (which are all different from each other). Finally, there are numbers in which xx = u but x = u. For example, 9 × 9 ≡ 21 × 21 ≡ 21 (mod 30) and 21 × 9 ≡ 9. We can use numbers like these to represent a relationship where the expression of A alone expresses the same trait as the expression of A with A and B with B, but differs from the expression of A with B or B alone (which are the same). And there are similar relationships when numbers that are both elements of Zn – Жn are paired together. Further research will determine whether there are organisms with traits that are expressed by relationships of these sorts. 1206 Appendix 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 13 v = α j while n = α k, then α (uj – km) = u; thus, since uj – km is an integer, α must divide u and the greatest common divisor of v and n also divides u. Therefore, uv ≡ u (mod n) if and only if the greatest common divisor of v and n also divides u. The next proof is useful for ﬁnding the elements of Жn for any n. Elements of Жn : If n/d is an integer relatively prime to d, then there is an a such that an/d Жn . 1220 1221 1222 1223 1224 1225 1226 1227 1228 Proof. Suppose n/d is an integer relatively prime to d. 1229 Now, since the multiplicative modular inverse of n/d modulo d exists if and only if n/d and d are relatively prime, this means there is an a such that an/d ≡ 1 (mod d). It follows then that 1 – an/d = dk, so 1230 1/d (1 − an/d ) = k. 1233 (1) And if we multiply (1) by an, then we have (where m = ak): an/d (1 − an/d ) = nm 1231 1232 1234 (2) Thus an/d(an/d) ≡ an/d (mod n), which means an/d Жn . Therefore, if n/d is an integer relatively prime to d, then there is an a such that an/d Жn . The next lemma shows that certain elements of Zn – Жn can dominate elements of Жn . 1235 1236 1237 1238 1239 Divisor Lemma. For all u Жn , if gcd(u,n) = δ , then δ u ≡ δ (mod n). 1240 Proof. Suppose u Жn and gcd(u,n) = δ . 1241 The statement that gcd(u,n) = δ implies that gcd(k,h) = 1 (where u = δ k and n = δ h ). Now, since u Жn , uu – u = nm and dividing by δ leaves k(u – 1) = hm. And by Euclid’s Lemma it follows that m = lk, since u – 1 is an integer and gcd(k,h) = 1. Thus, rewriting uu – u = nm as k(δ u – δ ) = nm, we have that δ u – δ = nl. Therefore, for all u Жn , if gcd(u,n) = δ , then δ u ≡ δ (mod n). The alteration proposition Before proving that the conditions aa + ãa , ãa Жn imposed on qa aa + ãa = q’a in Section 4.1 ensure that q’a Жn , we need to ﬁrst prove a few properties about elements of Жn . 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 Addition Lemma. If u, v Жn , then u+v ≡ uu + vv (mod n). 1254 Proof. Suppose u, v Жn . 1255 By deﬁnition of being elements of Жn , uu ≡ u (mod n) and 1256 vv ≡ v, which means: 1257 uu − v ≡ u − v (1) u − vv ≡ u − v (2) 1258 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 Theorem of Dominance. Given that u,v Жn , then uv ≡ u (mod n) if and only if the greatest common divisor of v and n also divides u. Proof. Suppose that the greatest common divisor of v and n also divides u. In other words, suppose that u = α i, v = α j, and n = α k, where gcd(v,n) = α , so j, k are relatively prime. By deﬁnition of being an element of Жn , vv ≡ v (mod n), hence nm = v(1 – v). It follows then that j(1 – v) = km, so k must divide 1 – v, since j and k are relatively prime; thus 1 – v = kh. Now, since u = α i, it follows that u(1 – v) = α ikh, so u – uv = nih (because n = α k). Thus uv ≡ u (mod n). Now suppose uv ≡ u (mod n), which means that uv – nm = u. Consequently, if the greatest common divisor of v and n is α and Thus: uu – v ≡ u – vv Transitivity of (1) and (2). Therefore if u, v Жn , then u +v ≡ uu + vv (mod n). 1259 1260 1261 Closure. If u,v Жn , then uv Жn . 1262 Proof. Suppose u,v Жn . 1263 This means that uu ≡ u (mod n) and vv ≡ v by deﬁnition. And if 1264 we multiply each expression by vv and u respectively, we have: 1265 uuvv ≡ uvv (1) uvv ≡ uv (2) 1266 Thus: uuvv ≡ uv (mod n) Transitivity of (1) and (2) Please cite this article as: B.E. Bahr, A formulation of the foundations of genetics and evolution, Mathematical Biosciences (2016), http://dx.doi.org/10.1016/j.mbs.2016.02.005 1267 1268 JID: MBS 14 1269 ARTICLE IN PRESS B.E. Bahr / Mathematical Biosciences xxx (2016) xxx–xxx Therefore, if u,v Жn , then uv Жn . 1271 Conjugate Lemma. If u Жn , then u has a conjugate such that 1 – u Жn . 1272 Proof. Suppose u Жn . 1270 1273 1274 This means that uu ≡ u (mod n). If we add 1 – 2 u to each side of uu ≡ u, we get 1 − 2u + uu ≡ 1 − 2u + u (mod n ) 1275 And since (1 – u)(1 – u) = 1 – 2 u + uu, (∗ ) reduces to (1 − u )(1 − u ) ≡ 1 − u (mod n ). 1279 Therefore, if u Жn , then 1 – u Жn . With these results in hand, we can prove the Alteration Proposition (we will suppress the subscript a in qa aa + ãa = q’a to make for easier reading): 1280 Alteration Proposition. If q, a + ã, ã Жn , then qa + ã Жn . 1281 Proof. Suppose q, a + ã, ã Жn . 1276 1277 1278 1282 1283 1284 1285 1286 [m5G;March 8, 2016;22:1] We can begin by rewriting qa + ã as q(a + ã) + ã(1 – q). Since q, a + ã Жn , it follows that q(a + ã) Жn (closure of Жn ); and, since ã Жn and 1 – q Жn (Conjugate Lemma), it follows that ã(1 – q) Жn (closure of Жn ). From the Addition Lemma, then q(a + a˜ ) + a˜ (1 − q ) = q(a + a˜ )q(a + a˜ ) + a˜ (1 − q )a˜ (1 − q ) (1) Now, if we multiply [q(a + ã) + ã(1 – q)][q(a + ã) + ã(1 – q)], we 1287 ﬁnd this equals: 1288 q(a + a˜ )q(a + a˜ ) + 2q(a + a˜ )a˜ (1 − q ) + a˜ (1 − q )a˜ (1 − q ) (2) So, since q Жn , by deﬁnition then q(1 – q) ≡ 0 (mod n), (2) is 1289 congruent to: 1290 q(a + a˜ )q(a + a˜ ) + a˜ (1 − q )a˜ (1 − q ) (3) And since (3) is congruent to [q(a + ã) + ã(1 – q)][q(a + ã) + ã(1 – 1291 q)], then by transitivity of (1) and (3): 1292 [q(a + a˜ ) + a˜ (1 − q )][q(a + a˜ ) + a˜ (1 − q )] = [q(a + a˜ ) + a˜ (1 − q )](mod n ) Therefore, if q, a + ã, ã Жn , then qa + ã Жn . References 1294 [1] [2] [3] [4] 1295 1296 1297 1298 1299 B. Pierce, W.H. Freeman, Genetics: A Conceptual Approach, fourth ed, 2010. B. Pierce, W.H. Freeman, Transmission and Population Genetics, ﬁrst ed, 2006. A. Ruvinsky, J. Sampson, The Genetics of the Dog, CABI Publishing, 2001. J. Templeton, A. Stewart, W. Fletcher, Coat color genetics in the Labrador retriever, J. Heredity 68 (1977) 134–136. Please cite this article as: B.E. Bahr, A formulation of the foundations of genetics and evolution, Mathematical Biosciences (2016), http://dx.doi.org/10.1016/j.mbs.2016.02.005 1293