Tutorial for game theory: mixed strategies

First, here are some basic definitions: Aquí están las definiciones básicas:

Two-person zero-sum game Juego bipersonal de suma cero

A two-person zero-sum game is a game with two players (Player A and Player B) such that:

One player's loss equals the other's gain ("zero sum").
The outcome of the game is determined by each player's choice from among a fixed, finite set of moves. (The possible moves may be different for the two players.)
The outcome of the game is a score, called the payoff. A positive payoff indicates a win for Player A, a negative payoff indicates a win for Player B, and zero indicates a draw.
If Player A has $m$ moves to choose from and Player B has $n$, we can represent the game using a payoff matrix: the $m\timesn$ matrix showing the payoff resulting from each possible pair of choices of moves. In the payoff matrix, Player A's moves, are listed on the left, while Player B's moves are listed along the top. We call Player A the row player and Player B the column player.

Un juego bipersonal de suma cero es un juego con (Jugador A y Jugador B) tal que:

La pérdida de cada jugador es igual al beneficio del su contrincante ("cero suma").
El resultado del juego es determinado por la selección por cada jugador de una acción de un conjunto fijo de acciones. (Las acciones posibles pueden ser distintas para los dos jugadores.)
El resultado del juego es un número, llamado el pago. Un pago positivo indica una ganancia por Jugador A, un pago negativa indica una ganancia por Jugador B, y cero indica un empate.
Si tiene Jugador A $m$ acciones de los que se puede elegir, y si tiene Jugador B $n$ tales acciones, entonces podemos representar el juego a través de una matriz $m\timesn$ llamada la matriz de pagos que muestra el pago que resulte de cada par de las acciones posibles. En la matriz de pagos, las acciones de Jugador A son enumeradas por el lado izquierdo, mientras que las de Jugador B son enumeradas por la parte superior. Llamamos a Jugador A el jugador renglón y a Jugador B el jugador columna.

&2: Rock, paper, scissors Piedra, papel, tijeras

Each player has the same three moves: rock, paper, and scissors. Rock beats (crushes) scissors; scissors beat (cut) paper, and paper beats (wraps) rock.
Each +1 entry indicates a win for the row player, −1 indicates a loss, and 0 indicates a tie.

Do you want to play? You are Player A and I am Player B. Click on a row move...

Cada jugador tiene las mismas tres acciones: piedra, papel, y tijeras. Piedra vence a las tijeras rompiéndolas, las tijeras vencen al papel cortándolo, y el papel vence a la piedra envolviéndola.
Cada entrada +1 indica una ganancia para el jugador renglón, −1 indica una pérdida, y 0 indica un empate.

¿Quieres jugar? Tú eres Jugador A y yo soy Jugador B. Clic en una acción renglón...

More terms Más términos

In each round of the game, the way a player chooses a move is called a strategy. A player using a pure strategy makes the same move each round of the game. For example, if a player in the above game chooses to play scissors (s) at each turn, then that player is using the pure strategy s. A player using a mixed strategy randomly chooses each move a certain percentage of the time; for instance, Player A might choose to play &4 50% of the time, and each of &5 and &6 25% of the time.

We represent the strategy of the row player by a row matrix and the strategy of the column player by a column matrix. In each case, the ith entry is the percentage of time the player uses move i.

A cada turno del juego, la manera en la que escoge un jugador su acción se llama una estrategia. Un jugador usa una estrategia pura si usa la misma acción a cada turno del juego. El jugador usa una estrategia mixta si a cada turno escoge al azar una acción para que cada acción se está usado una fracción determinada del tiempo; por ejemplo, puede escoger jugar &4 50% del tiempo, y cada una de &5 y &6 25% del tiempo.

Representamos a estrategia del judagor renglón por una matriz renglón, y la estrategia del jugador columna por una matriz columna. En cada caso, la i-ésima entrada es el porcentaje del tiempo el jugador usa acción i.

&2s

If, in the above game, the row player (Player A) uses &4 50% of the time, and each of &5 and &6 25% of the time, then the row player (mixed) strategy is written as Si, durante el juego más arriba, el jugador renglón (Jugador A) usa &4 50% del tiempo, y cada una de &5 y &6 25% del tiempo, entonces la estrategia (mixta) renglón se representa por

Row strategy Estrategia renglón

If, on the other hand, the row player always plays &6, then the row player (pure) strategy is Si, por otro lado, el jugador renglón siempre usa &6, entonces la estrategia (pura) renglón es

$R = \mat[4]{\[0 , 0 , 1\]}$

If the column player (Player B) uses &4 20% of the time, &5 50% of the time, and &6 30% of the time, then the column player mixed strategy is written as Si el jugador columna (Jugador B) usa &4 20% del tiempo, &5 50% del tiempo, y &6 30% del tiempo, entonces la estrategia (mixta) columna se representa por

Column strategy Estrategia columna

You are the head football coach of the Alphas, and are attempting to come up with a strategy to deal with your rivals, the Betas. The Alphas are on offense, and the Betas on defense. You have five preferred plays, but are not sure which to select. You know, however, that the Betas usually employ one of three defensive plays. Over the years, you have diligently recorded the yardage gained by your team for each combination of plays used, and have come up with the following table (negative numbers denote yardage lost): Eres el entrenador en jefe del equipo futbol americano, los Alphas, y estás intentando desarrollar una estrategia para usar contra tus contrincantes, los Betas. Los Alphas están al ataque y los Betas a la defensa. Tienes cinco jugadas ofensivas preferidas, pero no estás seguro cual escoger. Sin embargo, sabes que los Betas suelen escoger entre tres jugadas defensivas. A lo largo de loa años, has recordado diligentemente el yardaje (número de yardas) ganado por tu equipo con cada combincación de jugadas usadas, y has construido la siguiente tabla (números negativos denotan yardas perdidas):

&7 Suppose both coaches play the mixed strategies in the above example. How many yards will be gained (or lost) by the Alpha team each play? Supongamos que todos dos entrenadores juegan las estrategias mixtas como muestran en el ejemplo más arriba. ¿Cuántas yardas ganará (o perdirá) el equipo Alpha cada jugada?
&8 That depends on what specific defense or offense each coach selects for that specific play. (Remember that they are playing mixed strategies.) A better question to ask is: On average, how many yards will be gained by the Alpha team each play? This quantity—the average number of points gained by the row player—is called the expected payoff and the method of calculation is shown below: Esto depende de cual defensa y ataque escoge cada entrenador durante aquella jugada específica. (Acuérdate que están jugando estrategias mixtas.) Una pregunta mejor sería: ¿Cuántas yardas, por promedio, ganará el equipo Alpha cada jugada? Esta cantidad—el número promefio de puntos ganado por el jugador renglón—se llama el pago esperado y lo calculamos como sigue:

Expected payoff Pago esperado

If a game has payoff matrix $P$, and if the row player uses the mixed strategy $R$ and the column player uses the mixed strategy $C$, then the associated expected payoff $e$ is the average payoff taken over a large number of such games, and is given by the product

$e = RPC$ Note: The quantity $e$ is also referred to as the expected value of the game for the given row and column strategies. Si un juego tiene matriz de pagos $P$, y si juegue la estrategia mixta $R$ el jugador renglón, y juegue la estrategia mixta $C$ el jugador columna, entonces el pago esperado asociado $e$ del juego es el pago promedio, medido durante un gran número de tales jugadas, representado por el producto

$e = RPC$ Nota: A la cantidad $e$ se refiere también como el valor esperado del juego para las dadas estrategias renglón y columna.

Suppose that a game has payoff matrix Supongamos que un juego tiene matriz de pagos $P = \mat[4]{\[2 , -2\]!0 , -1}. $ and that the players use the strategies y que los jugadores usan las estrategias $R = \mat[4]{\[.5 , .5\]}, C = \mat[4]{\[.20\]!.80}, $ then the expected payoff is entonces el pago esperado es

$e = RPC$	=	$\mat[4]{\[.5 , .5\]} \mat[4]{\[2 , -2\]!0 , -1} \mat[4]{\[.20\]!.80}$
	=	$\mat[4]{\[1 , -1.5\]} \mat[4]{\[.20\]!.80}$	We first calculated the product $RP$. Calculamos primero el producto $RP$.
	=	$\mat[4]{\[-1\]}$

So, $e = -1,$ and the row player will lose an average of 1 point per game. Por lo tanto, $e = -1,$ y el jugador renglón pedirá un promedio de 1 punto por juego.

Here again is the payoff matrix for rock, paper, scissors:

Aquí está de nuevo la matriz de pagos para piedra, papel, tijeras:


$P = $	0	-1	1
	1	0	-1
	-1	1	0

Your name is Bob Stone, and you are playing rock, paper, scissors against Melanie Sharpe. You have a strong preference for playing rock, and play it &11% of the time. You are less fond of paper, which you play &12% of the time. Melanie Sharpe tends to favor scissors, which she plays &16% of the time, whereas she plays rock only &14% of the time. You are the row player. The row and column mixed strategies are:

Tu nombre es Juan dePiedra, y estás jugando piedra, papel, tijeras contra Lupita Filosa. Prefieres jugar piedra, que juegas &11% del tiempo. Nop te gusta papel, y lo juegas solo &12% del tiempo. Lupita suele favorecer tijeras, y las juega &16% del tiempo, mientras que juega piedra solo &14% del tiempo. Tú eres el jugador renglón. Las estrategias mixtas renglón y columna son:

$R$ =

The expected payoff is then A continuación, el resultado esperado es

Let us go back to the Alphas and the Betas: Recall that you (the Alpha coach) have decided to try a mixed strategy that uses offenses #1 and #3 each of the time and offense #2 the rest of the time, and that the Betas coach has decided to use defenses #1 and #2 each of the time. The payoff matrix is as follows:

Regresemos a los Alphas y los Betas: Acuérdate que tú (eres el entrenador de los Alphas) has decidido probar una estrategia mixta que usa las jugadas ofensiva #1 y #3 cada una del tiempo, y jugada #2 el resto del tiempo, y que el entrenador de los Betas ha decidido usar las jugadas defensivas #1 y #2 cada una del tiempo. La matriz de pagos es como sigue:

The average net yardage gain for the Alphas (a negative value indicates a net loss) is El ganancia neta promedio para loa Alphas (un número negativo indica una pérdida neta) es

[Suggestion: Use the Matrix algebra tool to do the calculation.]

[Sugerencia: Usa la Herramienta álgebra matricial para hacer la calculación.]

Optimal counterstrategy Contraestrategia óptima

In the discussion up to this point, we have assumed that the strategies of both players are known. What if only one of the strategies is known? To illustrate this situation, let us go back to the football scenario. Here is the payoff matrix: En el discurso hasta este punto, hemos supuesto que las estrategias mixtas de ambos jugadores están conocidas. ¿Qué tal si está conocida solo una de aquellas? Para ilustrar tal situación, regresamos de nuevo al escenario sobre futbol americano. Aquí es la matriz de pagos:

&7 As a result of having observed the Betas coach for many years, you (the Alpha coach) know that he tends to use defense #1 and 2 each 20% of the time, and defense #3 60% of the time. Which (possibly mixed) counterstrategy should you use to make the expected payoff as large as possible? Como resultado de haber observado el entrenador de los Betas durante muchos años, sabes (como entrenador de los Alphas) que suele usa jugadas defensivas #1 y #2 cada una 20% del tiempo, y jugada defensiva #3 60% del tiempo. ¿Cuál (posiblemente mixta) estrategia debes usar para hacer tan grande como sea posible el pago esperado?
&8 This time, all we know is the mixed column strategy: Esta vez, todo que sabemos es la estrategia mixta columna:

$C = \mat[4]{\[.20\]!.20!.60}$. Since you do not know what your own strategy should be, take its entries to be unknowns. That is, let Ya que no sabes qué debe ser tu propia estrategia, toma desconocidas para sus entradas; es decir, toma

$R = \mat[4]{\[x , y , z , u , v\]}$. Our task is now to determine which values of $x, y, z, u$ and $w$ will result in the highest expected payoff. So, we go ahead and compute the expected payoff using what we have: Nuestra tarea es determinar cuales valores de $x, y, z, u$ y $w$ resultará en el pago esperado más alto, así que seguimos adelante y calcular el pago esperado usando lo que tenemos:

$e = RPC$	=	$\mat[4]{\[x , y , z , u , v\]}$ $\mat[4]{\[.20\]!.20!.60}$
	=	$\mat[4]{\[x , y , z , u , v\]}$	It is easiest to first multiply the pure numerical matrices $P$ and $C$. That is, think of the product as $R(PC)$: Es más fácil multiplicar primero las matrices puras numéricas $P$ y $C$. Es decir, piensa en el producto como $R(PC)$:
	=

Now we need this to be as big as possible. Think of the expression Ahora, deseamos que este valor sea tan grande como posible. Piensa en la expreción

as a weighted average of the numbers 2, 8.4, −1, 7, and 3, added in the proportions $x, y, z, u,$ and $v$. Since the largest of these numbers is 8.4, we will get the largest weighted average by using 100% of 8.4 (set $y$ = 1) and 0% of the other numbers (set the other unknowns equal to 0). (Using the lower values like 2, 7, or 3 would dilute the total effect, and using −1 would really hurt you.) That is, como un promedio ponderado de los números 2, 8.4, −1, 7, y 3, sumados en las proporciones $x, y, z, u,$ y $v$. Ya que el mas grande de estos números es 8.4, lograremos el promedio ponderado más grande por usar 100% de 8.4 (igualar $y$ a 1) y 0% de los demás (igualar el resto de los desconocidos a cero). (Usar los valores más bajos diyularía el efecto total, y usar −1 te haría realmente daño. Es decir,

y = 1

which gives us the expected payoff as que nos da el pago esperado como

8.4(1)

(which also is the largest term in the product $PC$).

As $y = 1$ and all the other values are zero, the best row strategy is the pure strategy

Ya que $y = 1$ y son cero todos los otros valores, la estrategia renglón mejor es la estrategia pura

$R = \mat[4]{\[x , y , z , u , v\]} = \mat[4]{\[0 , 1 , 0 , 0 , 0\]}$ which means that the Alphas should always play offense #2, for which they can expect to gain an average gain of 8.4 yards per play.

Now here is one for you to do, where this time you must calculate the best pure strategy from the column player's point of view, given a knowledge of the row strategy: que significa que los Alphas siempre deben jugar el ataque #2, con el que pueden esperar a ganar un promedio de 8.4 yardas por jugada.

A continuación, aquí está uno par ti, donde debes calcular la estrategia mejor desde el punto de vista del jugador columna, dado un conocimiento de la estrategia renglón:

Here again is the payoff matrix for rock, paper, scissors:

Aquí está de nuevo la matriz de pagos para piedra, papel, tijeras:


$P = $	0	-1	1
	1	0	-1
	-1	1	0

You (the row player as usual) are playing rock, paper, scissors against a very observant opponent. Although you think you are playing purely at random, your opponent has noticed that you tend to play rock &20% of the time, paper &21% of the time, and scissors &22% of the time.

In order to compute the best counterstrategy for the column player, we should set up $R$ and $C$ as follows [Use $x, y, z$ for entries of any strategy that is currently unknown]:

Tú (eres el jugador renglón como de costumbre) estás jugando piedra, papel papel, tijeras contra un contrincante muy observador. Aunque piensas que estas jugando puramente al azar, tu contrincante se ha dado cuenta de que sueles jugar piedra &20% del tiempo, papel &21% del tiempo, y tijeras &22% del tiempo.

Para calcular el mejor contraestrategia para el jugador columna, debemos configurar $R$ y $C$ como sigue [Usa $x, y, z$ para las entradas de cualquiera estrategia corrientemente desconocida]:

$R$ =

When these strategies are used these strategies, the expected payoff is [Use graphing calculator input for formulas, eg. 2x - 3y + 8z]: Cuando se usa estas estrategias, el pago esperado es [Usa formato calculadora graficadora para formula ej. 2x - 3y + 8z]:

&7 Duh! If my opponent knows I am playing rock most of the time, then obviously she should play paper to beat me. Who needs all this nonsense about matrices to figure that out, especially when it gives the wrong answer? ¡Qué estupidez! Si me contrincante sabe que estoy jugando piedra la mayoría del tiempo, entonces es obvio que ella debe jugar papel para ganarme. ¿Quién necesita todas esas tonterías acerca de matrices para entender eso, especialmente cuando nos da la respuesta incorrecta?
&8 Yes, most of the time you are playing rock, and so you would lose to paper. However, you are playing scissors almost as often as rock, so a decision by your opponent to play paper would carry some risk, as paper would only earn her an average of 40 − 35 = 5 points for every hundred plays. A better bet is for your opponent to play rock, which ties every time you play rock and beats your scissors, meaning that your opponent would win an average of 35 − 25 = 10 points for every hundred plays. Sí, estás jugando piedra la mayoría del tiempo, y así perdería a papel. Sin embargo, estas también jugando tijeras casi tan seguido como piedra, así que una decisión por tu contrincante jugar papel llevaría bastante riesgo, ya que papel le ganaría un promedio de 40 − 35 = 5 puntos por cada cien jugadas. Una opción mejor para ella es jugar piedra, que te empata cada vez que juegas piedra y gana tus tijeras, resultando en un promedio de 35 − 25 = 10 puntos para tu contrincante por cada cien jugadas.

Now go on to Part B of this tutorial by pressing "Next topic" on the side. Alternatively, you can now try some of the exercises in Section 3.4 of or , or try the topic true false quiz. Ahora va a Parte B de este tutorial por pulsar el vinculo "Tutorial siguiente" al lado. En cambio, puedes probar algunos ejercicios en la sección 3.4 de o , o bien el concurso verdadero falso de este tema.

var i = 0; if (theLanguage == "es") i = 1; var theTitle = parent.theTut.titles[i]; document.writeln(theTitle); if (parent.playingGame) document.writeln('<i><font color = #C93344>' + gameVersion + '</font></i>')

writeText("Goodies","Cosas buenas")

Optimal counterstrategy Contraestrategia óptima