Mutual Information Maximization

$$$$I(A, B) == H(A) - H(A mid B) == H(B) - H(B mid A)

Mutual Information Maximization

$$$$def expectation(l):

N=10000

return mean(l() for _ in range(N))

I(f(A, B)) >= expectation(lambda: f_theta(a, b) - expectation(lambda:

log(sum(exp f_theta(a, b_tilde))))) + log(mid B_tilde mid)

Mutual Information Maximization

$$$$def expectation(l):

N=10000

return mean(l() for _ in range(N))

expectation(lambda: f_theta(a, b) - log(sum(exp f_theta(a, b_tilde)

)))

Mutual Information Maximization

$$$$I(A, B) == H(A) - H(A mid B) == H(B) - H(B mid A)

Mutual Information Maximization

$$$$def expectation(l):

N=10000

return mean(l() for _ in range(N))

I(f(A, B)) >= expectation(lambda: f_theta(a, b) - expectation(lambda:

log(sum(exp f_theta(a, b_tilde))))) + log(mid B_tilde mid)

Mutual Information Maximization

$$$$def expectation(l):

N=10000

return mean(l() for _ in range(N))

expectation(lambda: f_theta(a, b) - log(sum(exp f_theta(a, b_tilde)

)))