Matrices and matrices in Latex

I present you yet another post on mathematics. This time I concentrate on some basics of linear algebra and matrices and introduce you to a good video series on the subject. Below the embedded pdf file you will find the links to the referred web pages.

20170604_Matrices_linear_transformations

3Blue1Brown

video series on linear algebra

video on change of basis

video on eigenvectors and eigenvalues

Sprague-Grundy functions in game theory

I have again spent some time studying game theory a bit further with Thomas S. Ferguson’s book Game Theory. The latest topic was Sprague-Grundy functions that are used to describe, how to choose a winning strategy in combinatorial games. Sprague-Grundy functions are often used to analyze games, when it is not clear which positions in the game are P-positions and which are N-positions. If the Sprague-Grundy functions exist for all the positions in the game, a winning strategy is easily found. For simple games, the Sprague-Grundy offers little added information, since it just provides us the P- and N-positions in the game, but for more complex games it gives information that a simple analysis of P- and N-positions does not provide. I will take a look at this in a later post.

Sprague-Grundy function

The Sprague-Grundy (S-G) function g(n) of node n is defined recursively as follows:

  1. For all terminal nodes g(n) = 0.
  2. For all other nodes g(n) is the smallest non-negative integer that does not belong to the Sprague-Grundy values of the nodes direct followers.

It is easy to show, that the definition of a Sprague-Grundy function is equal to the definition of P- and N-positions in a game.

We remember that P- and N-positions are such positions, that the Previous player to move into or the Next player to move from that position can always win the game. The three definitions of P- and N-positions in combinatorial games are:

  1. The terminal position is a P-position.
  2. An N-position is such that there is at least one possible move into a P-position.
  3. A P-position is such that from it is possible to move only into an N-position.

We will now show that Sprague-Grundy functions fulfill these three conditions, where g(n) = 0 is a P-position and g(n) > 0 is an N-position.

Definitions 1 and A show that for the terminal position the S-G value equals the P-position.

For every other position we have either g(n) = 0 or g(n) > 0, respectively. If g(n) = 0, it means that each predecessor of that node must have an S-G value larger than 0. Therefore, from every position with an S-G value of larger than 0 we can move into a position with an S-G value of 0 (definition 2). Also, if we are not at the terminal node and g(n) = 0, we can only move to positions with an S-G value of larger than 0, since otherwise the position we are in could not have an S-G value of 0 (definition 3).

In conclusion, we see that from positions with an S-G value of larger than 0 we can always move to positions with an S-G value of 0 (equal to definition 2), and from positions with an S-G value of 0 we can only move to positions with an S-G value of larger than 0 (equal to definition 3).  Thus, we have shown that the Sprague-Grundy functions give us the P- and N-positions in a game. The difficulty is that they must be defined recursively, and that they do not exist for “circular paths” in games, where you might end up in a previous position after a certain number of moves.

Functions and mental health

Going through my old university mathematics book (ISBN 952-91-9157-X), I read a chapter introducing and discussing functions. At the end of the chapter (see page 163), I saw an interesting exercise that took an interesting perspective on using functions to think about relations between everyday things.

I first introduce a definition of a function, adapted from the book, and then the mentioned exercise, also from the book.

A definition of a function:
A function is a triple (set A, rule f, set B). This means that we have a rule f that pairs each member of set A with exactly one member of set B.

This “pairing” (or mapping) is written as y = f(x), y B for every x A. We might also write f: A → B which means that rule f maps set A, f’s domain, to B.

As we can see from the definition, pairing multiple, different members of set A to set B is not precluded, but each member of A is paired with one and only one member of set B. In addition, it may be that not all members of set B have a pair in set A. I’ll now turn to the exercise.

Psychiatrist’s reception as a function

A psychiatrist’s reception can be modeled as a triplet (psychiatrist, patient, diagnosis). Under which assumptions are the following cases functions?

a) psychiatrist: patients →  diagnoses
b) patient: psychiatrists   diagnoses
c) psychiatrist: diagnoses   patients
d) patient: diagnosis   psychiatrists

We we’ll examine each case from a) to d) individually. In general, based on the notation introduced with the definition of a function, each of cases a) to d) have the following form:

rule: Set A →  Set B

Our chosen triplet (psychiatrist, patient, diagnosis) has to be quite restricted, if cases a) to d) are to be functions. First, a psychiatrist is not allowed to change his diagnosis for a specific patient, since this would preclude mapping the respective members of Set A each to a single member of Set B, as we we’ll soon see. In addition, Set A has to be quite restricted, if all its members are to be mapped. We therefore have one assumption that pertains to all cases a) to d):

Assumption 1 (A1):
A psychiatrist cannot change his diagnosis for a specific patient.

and another assumption whose exact form depends in the respective case:

Assumption 2x (A2x, where x = some case a)-d)):
Set A is limited so that each of its member is mapped to Set B.

Cases a) to d) as functions

Starting with case a), a psychiatrist maps every patient to a diagnosis, potentially multiple patients to the same diagnosis. Assumption A2a) limits the set of patients to those patients that the psychiatrist has diagnosed. The psychiatrist has given every patient some diagnosis, potentially the same to each one.

In case b), a patient maps every psychiatrist to a diagnosis, potentially multiple psychiatrists to the same diagnosis. Assumption A2b) limits the set of psychiatrists to those that have diagnosed the patient. Each psychiatrist may have given the patient the same diagnosis, but not necessarily.

In case c), a psychiatrist maps every diagnosis to a patient, potentially multiple diagnoses to the same patient. Assumption A2c) limits the set of diagnoses to those that the psychiatrist has given to some patient. Each diagnosis may have been given to the same patient, but not necessarily.

In case d), a patient maps every diagnosis to a psychiatrist, potentially multiple diagnoses to the same psychiatrist. Assumption A2d) limits the set of diagnoses to those that the patient has received from some psychiatrist. Each diagnosis may have been given by the same psychiatrist, but not necessarily.

As we can see from the cases above, a large (> 1 member) Set A may give us some interesting real-life consequences in each case:

  • In case a) the psychiatrist may give each patient the same diagnosis.
  • In case b) multiple psychiatrists may all give distinct diagnoses to the same patient.
  • In case c) the psychiatrist may give multiple diagnoses to the same patient.
  • In case d) a patient may receive multiple diagnoses from multiple psychiatrists or from just one.

Now we have defined, under which assumption cases a) to d) are functions. At the same time, we see that these assumptions are not enough to guarantee that our functions correspond to real life. For our function to correspond to real life, we would intuitively like the following conditions to be true:

  • A single patient gets a single, unchanged diagnosis from a single psychiatrist on each visit on the short term. (C1)
  • A single patient gets the same (or almost the same) diagnosis from different psychiatrists on the short term. (C2)
  • A psychiatrist’s diagnosis may change on the long term. (C3)
  • A change in a psychiatrist’s diagnosis can be very dramatic compared to the previous diagnosis. (C4)
  • Similar patients get similar diagnoses. (C5)

A more realistic function

As we saw in the analysis before we have two assumptions that define whether cases a) to d) are functions. Assumption 2 cannot be said to be either realistic or unrealistic, since we only define Set A, which is just an arbitrary choice. Assumption 2 merely limits the usage and potential scope of our function: the more limited our Set A, the fewer are the situations where the respective case is a function.

Assumption 1, however, can be evaluated to be more or less realistic. A psychiatrist changing his diagnosis over time is not something we can define, rather it is a given that is either true or false: between now and a given point of time in the future a psychiatrist either has or hasn’t changed his diagnosis of a patient.

We see that A1 can be evaluated as being realistic or not, depending or whether we are considering a shorter or a longer period in time. On the short term it sounds reasonable to assume that a psychiatrist’s diagnose of a single patient does not change (C1), while on the long term we might assume some changes (C3), especially if the patient has some mental condition. Even for currently mentally healthy people A1 is not necessarily true in the long term, since a person might develop a mental disorder (C4). However, we can overcome the long-term limitation of A1 by creating a composite function for the long-term perspective (C3). This function consists of the respective short-term functions, whose time horizons do not overlap but form a continuum. Thus, this long-term function allows a psychiatrist to change his diagnosis of a patient over time, but for shorter periods, the diagnosis stays unchanged.

By introducing time as a further variable we have redefined our triplet (psychiatrist, patient, diagnosis) as a new triplet (psychiatrist, patient, diagnosis(time)), where the diagnosis is a function of time. By defining psychiatrist = psychiatrist(i) where i is the index of the respective psychiatrist we also introduce C1 into our triplet, thus creating a triplet (psychiatrist(i), patient, diagnosis(time)). In order to consider C5, we also adjust our original triplet in such a way. that each patient is also a function of his attributes that define his mental health, which then influence the respective diagnosis.

Taking case a) now as an example for using the new triplet we observe case a)*:

a)* Psychiatrist(i): patients(attributes)  diagnoses(time)

In case a)* a psychiatrist gives each of his patients always the same diagnosis on the short term, but is allowed to give a different diagnosis on the long term. Previously we defined that the diagnosis remains unchanged on the “short term” but did not define short term. In fact, we could define our time intervals to be infinitesimally short. In this case, the psychiatrist would be allowed to change his diagnosis as quickly as he can, but this is hardly reasonable. As a gut feeling, I would say that our time interval, during which the diagnosis has to stay the same, could be anything between months and years, depending on the patient. Thus, we run into quite subtle questions when deciding, whether we have defined a function or not. Additionally, we would have “patient” as a variable when defining the time interval, thus ending up with diagnosis = diagnosis(time(patient)), making our function even more complicated. Actually, a more clear notation for our new function would be

a)*’ Time: Attributes (Psychiatrist(i): patients(attributes) → diagnoses)

Now a given point in time defines which attributes belong to a specific patient. Then this patient receives from a psychiatrist a specific, unchanged diagnosis within the defined time period, and patients with similar attributes receive the same diagnosis. This seems more like what we would expect to see in real life.

I conclude by saying that case a)* also allows a patient to get different diagnoses from different psychiatrists. For C2 to hold we must also require that the difference between the diagnoses provided by two different psychiatrists is always “small” enough. Here “small” means, roughly speaking, that each psychiatrist could agree on the main points of the other’s diagnosis, while they might have a difference of opinion on some details.

Final thoughts

I was thinking about writing just a few chapters on this but it turned out a bit longer, and I could have written even more. This whole exercise got me thinking more about what function are and how we define them. Especially the properties of nested function like diagnosis = diagnosis(time) and patient = patient(attributes) awoke my interest, since they show how careful we have to be when defining mappings and boundary conditions for them when modeling the real world. It is also clear that at some point we have to make simplifications, if we want to have a usable model and avoid getting entangled in endless nested functions.

I would say that case a*’) as a function captures quite well the essence of psychiatrist’s reception, while also considering potentially multiple psychiatrists, patient attributes and temporal changes in these attributes. The model is surely not perfect, but to me it’s a good start.