CS4LL5 Assignment 3 Solved

35.00 $

Category:

Description

Rate this product

This is a worked example showing convergence of EM training with IBM model 1. The pairs are

s1 la maison s2 la fleur
o1 the house o2 the flower

To apply the brute force EM algorithm, for each pair, each of its possible alignments has to be considered. Including the possibility of aligning positions in o with NULL, there are 32 = 9 possibilities. To save a little in the pencil-and-paper calculations, we will consider a version which does not allow aligning positions in o with NULL. In this case, there are 22 = 4 possibilities:

la ma la ma la ma la ma the ho the ho the ho the ho

la fl la fl la fl la fl

the flo the flo the flo the flo

use a1 . . . a14 for the 4 possible alignments between o1 and s1 use a21 . . . a24 for the 4 possible alignments between o2 and s2

table shows translations probabilites, with tr(o|s) shown at row o, col s, and they are all ini- tialised to 1

3

tr(o|s) la ma f l

111 333

111 333

111 333

the

ho

flo

To execute the brute force EM algorithm we need first for the pairs o1, s1 and o2, s2 to determine theconditionalalignmentprobabilitiessop(a|o1,s1)andp(a|o2,s2). Theslidesgaveaderivation of the formula for p(a|o, s), it came out to be

􏰌j [p(oj |sa(j))]
p(a|o,s) = 􏰋a′ 􏰌j[p(oj|sa′(j))] (1)

and in the derivation 1 terms cancelled. In the corresponding derivation disallowing (ls +1)lo

alignments to NULL, there will instead be a cancellation of 1 terms, and exactly the same (ls )lo

formula for the conditional alignment probability (1) will be derived. As a name for the numerator term in (1) we will use num(a)
So to determine the p(a|o, s) values for each pair we need to

1

1. for each possible a determine num(a) (ie. 􏰌j[p(oj|sa(j))])
2. sum these to give the denominator 􏰋a num(a) and then take ratios

Armed with these conditional probabilities can then compute expected counts of o, s combina- tions across the corpus, and from these recalculate tr(o|s) probabilities.

ITERATION 1

considering the first pair, for each a1n calculate num(a1n): n u m ( a 1 1 ) n u m ( a 12 ) n u m ( a 13 )

= 1 1 ditto ditto 33

sim. for each a2 calculate num(a2 ). At this stage, these all work out as 1 . nn9

from these to calculate the conditional probabilities P(adn|o,s), need to sum the num(an) by summing across the table and use it as denominator ie.

n u m ( a 14 ) ditto

=1 9

P(adn|od,sd) = 􏰋num(adn)
n′ num(adn′)

P (a1|o1, s1) = 1/9

4×1/9 =1

4

P (a21|o2, s2) = 1/9

4×1/9 =1

4

P (a12|o1, s1) ditto

P (a2|o2, s2) ditto

P (a13|o1, s1) ditto

P (a23|o2, s2) ditto

P (a14|o1, s1) ditto

P (a24|o2, s2) ditto

Notice these numbers make intuitive sense: with all tr(o|s) set equal, all alignments should be equally probable, giving a value of 1 for each.

4

Now for each possible vocabulary combination o,s combination we have to make a count by going through all the alignments and incrementing the count by how many times o is paired with s in the alignment and multiplying that by the above conditional alignment probabilities

For these short sentences the o, s count for any alignment is at most 1, and it will be handy for the calculations to note for each (o, s) the alignments where it occurs once1

la ma fl

2:– based on this we get the following expected counts

2:2 4

the 1:1 2 2:1 2

ho 1:1 3 2:–

1:3 4 1:–

flo 1:– 2:1 3

1:2 4 1:– 2:– 2:–

1:– 1:–

2:–

2:3 4

1to read this table the (ho,la) entry has 1:1 3 to indicate in first pair (o1,s1), the (ho,la) pairing occurs 2:–

in alignments a1 and a13, and the pairing never occurs in the alignments for the second pair

2

cnt la ma fl the 4×1 2×1 2×1

ho2×12×1 0 44

flo2×1 0 2×1 44

and for these counts get new tr(o|s) by normalising by column sums

444

tr(o|s) la ma f l

42

flo 1 0 1 42

111 222

the
ho 1 1 0

ITERATION 2

using new tr(o|s) value re-calculate for each a1n, num(a1n), and for each a2n, num(a2n):

n u m ( a 1 1 ) n u m ( a 12 ) n u m ( a 13 ) n u m ( a 14 ) 11111111
2 41 2 22 2 41 2 22

=8=8=8=8

n u m ( a 21 ) n u m ( a 2 2 ) n u m ( a 23 ) n u m ( a 24 ) 11111111
2 41 2 22 2 41 2 22

=8=8=8=8 then re-calculate the conditional probabilities P (a|o, s).

P (a1|o1, s1) P (a12|o1, s1) P (a13|o1, s1) P (a14|o1, s1) 1111 6363

P (a21|o2, s2) P (a2|o2, s2) P (a23|o2, s2) P (a24|o2, s2) 1111 6363

then re-calculate the expected counts of o, s combinations as shown in the table below (eg. the expected count for (the,la) is coming from a1, a12, a21, a2)

cnt the

626 646 =6 =6

flo 1+1 0 2+2 626 646

=6 =6
and for these counts get new tr(o|s) by normalising by column sums

la ma fl

1+2+ 1+2 1+2 666666

1+2 =3 =3 66666

=6
ho1+12+2 0

3

tr(o|s) la ma f l

57

flo 1 0 4 57

333 577

the
ho 1 4 0

ITERATION 3

using new tr(o|s) value re-calculate for each a1n, num(a1n) and each a2n, num(a2n):

num(a1) num(a12) num(a13) num(a14 ) 31343134 55577577

n u m ( a 21 ) n u m ( a 2 2 ) n u m ( a 23 ) n u m ( a 24 ) 31343134 55577577

then re-calculate the conditional probabilities P (a|o, s).

P (a1|o1, s1) 0.1512

P (a21|o2, s2) 0.1512

P (a12|o1, s1) 0.4321

P (a2|o2, s2) 0.4321

P (a13|o1, s1) P (a14|o1, s1) 0.1080 0.3086

P (a23|o2, s2) P (a24|o2, s2) 0.1080 0.3086

then re-calculate the expected counts of o, s combinations
cnt la ma fl

the 0.1512 + 0.4321 + 0.1512 +

0.4321 = 1.167

ho 0.1512 + 0.1080

0.1080 + 0.3086 = 0.4167

0.4321 +

0.1080 + 0.3086 = 0.4167

0

0.4321 +

0.3086 ==

0.2593 flo 0.1512 +

0.7407 0

0.1080 == 0.2593 0.7407

and for these counts get new tr(o|s) by normalising by column sums tr(o|s) la ma fl

the 0.6923 0.36 0.36 ho 0.1538 0.64 0

f lo 0.1538 0 0.64
Over the 3 iterations, tr(the|la), tr(ho|ma) and tr(flo|fl) are steadily increasing.

If the calculations are carried on, after 10 iterations you have the following for the translation probabilities

4

0.3086

ho f lo

this is the history over the 10 iterations

0.904 0
0 0.904

o|s at each iteration

In the end the tr(o|s) table converges to:

tr(o|s) la ma f l the 0.982 0.096 0.096

0.009 0.009

1 2 3
0.5 0.6 0.69 0.77 0.84 0.25 0.2 0.15 0.11 0.081 0.25 0.2 0.15 0.11 0.081

Obs the house flower the house flower the house flower

Src 0
la 0.33 la 0.33 la 0.33 maison 0.33 maison 0.33 maison 0.33 fleur 0.33 fleur 0.33 fleur 0.33

4 5

6 7 0.89 0.93 0.056 0.037 0.056 0.037 0.2 0.16 0.8 0.84 0.00 0.00 0.2 0.16 0.00 0.00 0.8 0.84

8 9 10 0.95 0.97 0.98 0.024 0.015 0.009 0.024 0.015 0.009 0.14 0.11 0.096 0.86 0.89 0.9 0.00 0.00 0 0.14 0.11 0.096 0.00 0.00 0 0.86 0.89 0.9

0.5 0.43 0.36 0.5 0.57 0.64 0.00 0.00 0.00 0.5 0.43 0.36 0.00 0.00 0.00 0.5 0.57 0.64

0.3 0.24 0.7 0.76 0.00 0.00 0.3 0.24 0.00 0.00 0.7 0.76

tr(o|s) la ma
the 1 0 0 ho 0 1 0 flo 0 0 1

5

  • Assignment_3-9cthhl.zip