fairness | fairness in rankings | algorithmic bias | position bias | equal opportunity | paper | mathematics | demographic parity | disparate treatment | disparate impact | exposure | IR | recommendation

The Review of the Paper "Fairness of Exposure in Rankings"

This article reviews great points in the paper "Fairness of Exposure in Rankings" written by Ashudeep Singh and Thorsten Joachims.

Jason LiDecember 05, 2020 · 10 min read · Last Updated:

Introduction of the paper

Today, I’m going to share points and conceptions I got from a fantastic paper “Fairness of Exposure in Rankings1” written by Ashudeep Singh and Thorsten Joachims.

In this paper, the authors mainly introduce the fairness constraints based on exposure in rankings. They proposed three forms of fairness constraints, namely demographic parity, disparate treatment and disparate impact constraints to balance fairness to the items being ranked with the utility.

Demographic parity2

Demographic parity states that protected groups (based on sensitive attributes, e.g. sex, gender, etc.) should have equal opportunities to be shown or recommended. For example, male and female job seekers should have equal opportunities to expose in a job seeker website. It’s reasonable to distribute exposure more evenly, even if sacrificing the utility in rankings.

Disparate treatment2

Disparate treatment is a term refers to a kind of intentional discrimination. For example, only testing the particular skills of female applicants in a job seeker website.

Disparate impact

Unlike disparate treatment, disparate impact is referred to unintentional discrimination, for example, female job seekers with a lower proportion of all job seekers would be exposed less unintentionally.

The framework for fair ranking

As the authors of this paper proposed, the definition of fairness depends on context and application, which means that different fairness constraints should be applied on different scenarios. They developed a general framework for computing utility-maximizing ranking under different fairness constraints. Generally, the problem of optimal ranking under fairness constraints can be formulated as the following optimization problem:

r=arg maxrU(rq)r = \argmax_r U(r|q)

, where U(rq)U(r|q) denotes the utility of a ranking rr for query qq, the ranking rr presents the order of a set of documents D={d1,d2,d3,dN}\mathcal{D} = \{d_1, d_2, d_3\dots, d_N\}.

Untility

The utility measures of a ranking system evaluates the utility of the ranking from the relevance of the individual items being ranked. rel(du,q)rel(d|u, q) denotes the relevance score of the document dd when a user uu search a query qq, the relevance could be binary relevance, movie ratings or a real-valued score. In information retrieval systems, the generic way to express utility measures is:

U(rq)=uUP(uq)dDv(rank(dr))λ(rel(du,q)),U(r|q) = \sum_{u\in\mathcal{U}}P(u|q)\sum_{d\in\mathcal{D}}v(rank(d|r))\lambda(rel(d|u, q)),

where U\mathcal{U} is the set of all users that lead to identical qq, vv reflects the importance of ranking position and λ\lambda is a function mapping the relevance to its utility. Take Discounted Cumulative Gain (DCG) as an example:

DCG(rq)=uUP(uq)dD2rel(du,q)1log(1+rank(dr)),DCG(r|q) = \sum_{u\in\mathcal{U}}P(u|q)\sum_{d\in\mathcal{D}}\frac{2^{rel(d|u,q)} - 1}{log(1 + rank(d|r))},

where 2rel(du,q)1=λ(rel(du,q))2^{rel(d|u,q)} - 1 = \lambda(rel(d|u,q)) and 1log(1+rank(dr))=v(rank(dr))\frac{1}{log(1 + rank(d|r))} = v(rank(d|r)).

Due to the linear relationship between utility and both vv and λ\lambda, the above equation can be reformulated as:

U(rq)=dDv(rank(dr))(uUλ(rel(du,q))P(uq))=dDv(rank(dr))u(dq)U(r|q) = \sum_{d\in\mathcal{D}}v(rank(d|r))(\sum_{u\in\mathcal{U}}\lambda(rel(d|u,q))P(u|q)) \\ = \sum_{d\in\mathcal{D}}v(rank(d|r))u(d|q)

, where

u(dq)=uUλ(rel(du,q))P(uq)u(d|q) = \sum_{u\in\mathcal{U}}\lambda(rel(d|u,q))P(u|q)

is the expected utility of a document dd for query qq. u(dq)u(d|q) can be understood as the probability of relevance if rel(du,q)rel(d|u,q) are binary relevances. Therefore, sorting the documents by u(dq)u(d|q) is the effective way to maximize the utility for ranking:

arg maxrU(rq)argsortdDu(dq).\argmax_rU(r|q) \equiv \underset{d\in\mathcal{D}}{\mathrm{arg\,sort}}\,u(d|q).

Based on this equation, we can take an example to explain it:

Document IDPositionBinary Relevance Scorev(rank(dr))u(dq)v(rank(d\vert r))u(d\vert q)
010201log(1+1)\frac{2^{0} - 1}{log(1+1)}
121211log(1+2)\frac{2^{1} - 1}{log(1+2)}
230201log(1+3)\frac{2^{0} - 1}{log(1+3)}
341211log(1+4)\frac{2^{1} - 1}{log(1+4)}
451211log(1+5)\frac{2^{1} - 1}{log(1+5)}
560201log(1+6)\frac{2^{0} - 1}{log(1+6)}

We can easily get the best-ranking order by sorting the v(rank(dr))u(dq)v(rank(d\vert r))u(d\vert q), the ranking order with maximum utility should be {d1,d3,d4,d0,d2,d5}\{d_1, d_3, d_4, d_0, d_2, d_5\}.

Probabilistic Rankings

In this paper, the authors applied probabilistic rankings RR instead of a single deterministic ranking rr to avoid combinatorial optimization:

U(Rq)=rR(r)uUP(uq)dDv(rank(dr))λ(rel(du,q))=rR(r)dDv(rank(dr))u(dq)U(R|q) = \sum_rR(r)\sum_{u\in\mathcal{U}}P(u|q)\sum_{d\in\mathcal{D}}v(rank(d|r))\lambda(rel(d|u,q)) \\= \sum_rR(r)\sum_{d\in\mathcal{D}}v(rank(d|r))u(d|q)

, to avoid the exponential computation, the authors compute the utility by the marginal rank distributions3 of the documents.

Specifically, let Pi,jP_{i, j} be the probability that RR places document did_i at rank jj, then creating a doubly stochastic matrix4 of size N×NN \times N using PP. The matrix meets:

iPi,j=jPi,j=1\sum_iP_{i, j} = \sum_jP_{i, j} = 1

. Therefore, the expected utility for a probabilistic ranking can be computed as:

U(Pq)=diDj=1NPi,ju(diq)v(j),U(P|q) = \sum_{d_i\in\mathcal{D}}\sum_{j=1}^{N}P_{i,j} u(d_i|q)v(j),

where u(diq)u(d_i|q) for each user should be a column vector with size of NN, and vv is also a column vector of size NN. Thus, the expected utility can be rewritten as:

U(Pq)=uTPv.U(P|q) = u^TPv.

Optimizing Fair Rankings via Linear Programming

To efficiently compute RR for every doubly stochastic matrix PP, they formulated the problem of finding the utility-maximizing probabilistic ranking under fairness constraints in terms of doubly stochastic matrices instead of distributions over rankings:

P=arg maxPuTPv(s.t.1TP=1T,P1=1,0<=Pi,j<=1,Pisfair)P = \argmax_P u^TPv \\ (s.t.\, \mathbb{1}^TP = \mathbb{1}^T, P\mathbb{1} = \mathbb{1}, 0 <= P_{i,j} <= 1 ,P\,is\,fair)

The optimization objective is linear in N2N^2 variables Pi,jP_{i,j} (1<=i,j<=N1 <= i,j <= N). And the fairness constraints are linear as well. The solution without fairness constrains and for any vjv_j that decreases with jj is the permutation matrix5 that ranks the set of documents in decreasing order of utility. Therefore, the problem of finding the utility-maximizing probabilistic ranking and the fairness constraints can be linearly formulated as:

fTPg=h.f^TPg = h.

Sampling Rankings

The authors employed the Birkhoff-von Neumann(BvN) decomposition6 to compute RR from PP. Specifically, if AA is a doubly stochastic matrix, the decomposition of the form is:

A=θ1A1+θ2A2+...+θnAnA = \theta_1A_1 + \theta_2A_2 + ... + \theta_nA_n

where 0<=θi<=10 <= \theta_i <= 1, and AiA_i are permutation matrices. AiA_i corresponses to deterministic rankings of the document set and θi\theta_i corresponds to the probability of sampling each ranking.

The decomposition can be computed efficiently in polynomial time, according to Marcus-Ree theorem6, there are no more than (N1)2+1(N-1)^2 + 1 permutation matrices.

Summary of Algorithm

  1. Set up the utility vector uu and position discount vector vv, as well as the vectors ff and gg, and hh.
  1. Solve the linear program for PP.
  1. Compute the Birkhoff-von Neumann decomposition P=θ1P1+θ2P2+...+θnPnP = \theta_1P_1 + \theta_2P_2 + ... + \theta_nP_n.
  1. Sample permutation matrix PiP_i with probability proportional to θi\theta_i and get the corresponding ranking rir_i.

While setting up the vectors in the first step, u(dq)u(d|q) is assumed as true relevances and u^(dq)\hat{u}(d|q) is from some predictive models.

Fairness Constraints

First, to clearly express the fairness, the authors define the exposure for a document did_i under a probabilistic ranking PP as:

Exposure(diP)=j=1NPi,jvjExposure(d_i|P) = \sum_{j=1}^{N}P_{i,j}v_j

The goal is to ensure different groups GkG_k have the same opportunities of exposure.

Data sets

They employed two data sets, one is the fake synthetic data set which contains 6 job seekers with probabilities of relevance to an employer of u=(0.81,0.80,0.79,0.78,0.77,0.76)Tu = (0.81, 0.80, 0.79, 0.78, 0.77, 0.76)^T and the other is News recommendation dataset with ratings divided by 5 and added a Gaussian noise of 0.05 (finally clipped to 0 ~ 1).

Demographic Parity Constraints

This constraint enforces the average exposure of the documents in both groups is equal, the average exposure in a group is denoted as:

Exposure(GKP)=1GkdiGkExposure(diP),Exposure(G_K|P) = \frac{1}{|G_k|}\sum_{d_i\in G_k}Exposure(d_i|P),

this constraint can be easily expressed as:

Exposure(G0P)=Exposure(G1P)1G0diG0j=1NPi,jvj=1G1diG1j=1NPi,jvjdiDj=1N(1diG0G01diG1G1)Pi,jvj=0fTPv=0(withfi=1diG0G01diG1G1)Exposure(G_0|P) = Exposure(G_1|P) \\ \Longleftrightarrow \frac{1}{|G_0|}\sum_{d_i\in G_0}\sum_{j=1}^NP_{i,j}v_j = \frac{1}{|G_1|}\sum_{d_i\in G_1}\sum_{j=1}^NP_{i,j}v_j \\ \Longleftrightarrow \sum_{d_i\in \mathcal{D}}\sum_{j=1}^N(\frac{\mathbb{1}_{d_i\in G_0}}{|G_0|} - \frac{\mathbb{1}_{d_i\in G_1}}{|G_1|})P_{i,j}v_j = 0 \\ \Longleftrightarrow f^TPv = 0\,(with\,f_i=\frac{\mathbb{1}_{d_i\in G_0}}{|G_0|} - \frac{\mathbb{1}_{d_i\in G_1}}{|G_1|})

Experiments

For the data set of Job-seeker, after solving the linear program without fairness constrains, the below figure shows the optimal rankings in terms of PP.

Optimal unfair ranking that maximizes DCG. (Data set: Job-seeker)
Optimal unfair ranking that maximizes DCG. (Data set: Job-seeker)

The optimal rankings in terms of PP with Demographic Parity constraint is shown as the below figure:

Optimal fair ranking under demographic parity. (Data set: Job-seeker)
Optimal fair ranking under demographic parity. (Data set: Job-seeker)

This fair ranking can be decomposed into a mixture of two deterministic permutation matrices with the associated weights:

(c) and (d) are the BvN decomposition of the fair ranking. (Data set: Job-seeker)
(c) and (d) are the BvN decomposition of the fair ranking. (Data set: Job-seeker)

The results are shown by the below figure for the data set News recommendation dataset:

News recommendation dataset with demographic parity constraint. G0: Document id. 0-14, G1: 15-24 (a) Opti- mal unfair ranking that maximizes DCG. (b) Optimal fair ranking under demographic parity.
News recommendation dataset with demographic parity constraint. G0: Document id. 0-14, G1: 15-24 (a) Opti- mal unfair ranking that maximizes DCG. (b) Optimal fair ranking under demographic parity.

Disparate Treatment Constraints

This constraint ensure the exposure be proportional to relevance, to end this, the average utility of a group can be denoted as:

U(GkP)=1GkdiGkui,U(G_k|P) = \frac{1}{|G_k|}\sum_{d_i\in G_k}u_i,

the exposure of two groups to be proportional to their average utility:

Exposure(G0P)U(G0q)=Exposure(G1P)U(G1q)1G0diG0j=1NPi,jvjU(G0q)=1G1diG1j=1NPi,jvjU(G1q)i=1Nj=1N(1diG0G0U(G0q)1diG1G1U(G1q))Pi,jvj=0fTPv=0(withfi=(1diG0G0U(G0q)1diG1G1U(G1q))).\frac{Exposure(G_0|P)}{U(G_0|q)} = \frac{Exposure(G_1|P)}{U(G_1|q)} \\ \Longleftrightarrow \frac{\frac{1}{|G_0|}\sum_{d_i\in G_0}\sum_{j=1}^{N}P_{i,j}v_j}{U(G_0|q)} = \frac{\frac{1}{|G_1|}\sum_{d_i\in G_1}\sum_{j=1}^{N}P_{i,j}v_j}{U(G_1|q)} \\ \Longleftrightarrow \sum_{i=1}^N\sum_{j=1}^N(\frac{\mathbb{1_{d_i\in G_0}}}{|G_0|U(G_0|q)} - \frac{\mathbb{1_{d_i\in G_1}}}{|G_1|U(G_1|q)})P_{i,j}v_j = 0 \\ \Longleftrightarrow f^TPv = 0\,(with f_i = (\frac{\mathbb{1_{d_i\in G_0}}}{|G_0|U(G_0|q)} - \frac{\mathbb{1_{d_i\in G_1}}}{|G_1|U(G_1|q)})).

To compute how differently the two groups are treated, they defined a measure called Disparate Treatment Ratio (DTR):

DTR(G0,G1P,q)=Exposure(G0P)/U(G0q)Exposure(G1P)/U(G1q)DTR(G_0, G_1|P,q) = \frac{Exposure(G_0|P)/U(G_0|q)}{Exposure(G_1|P)/U(G_1|q)}

The results are not going to show here, please find details from the paper1.

Disparate Impact Constraints

For this constraint, they first define the probability of a document getting clicked:

P(clickondocumenti)=P(examiningi)×P(iisrelevant)=Exposure(diP)×P(iisrelevant)=(j=1NPi,jvj)×uiP(click on document i) = P(examining i) \times P(i is relevant) \\ = Exposure(d_i|P) \times P(i is relevant) \\ = (\sum_{j=1}^NP_{i,j}v_j) \times u_i

Then, compute the average clickthrough rate of documents in a group G_k as:

CTR(GkP)=1GkiGkj=1NPi,juivj.CTR(G_k|P) = \frac{1}{|G_k|}\sum{i\in G_k}\sum_{j=1}^NP_{i,j}u_iv_j.

The Disparate Impact Constraint enforces the expected clickthrough rate of each group is proportional to its average utility:

CTR(G0P)U(G0q)=CTR(G1P)U(G1q)1G0iG0j=1NPi,juivjU(G0q)=1G1iG1j=1NPi,juivjU(G1q)i=1Nj=1N(1diG0G0U(G0q)1diG1G1U(G1q))uiPi,jvj=0fTPv=0(withfi=(1diG0G0U(G0q)1diG1G1U(G1q))ui).\frac{CTR(G_0|P)}{U(G_0|q)} = \frac{CTR(G_1|P)}{U(G_1|q)} \\ \Longleftrightarrow \frac{\frac{1}{|G_0|}\sum_{i\in G_0}\sum_{j=1}^NP_{i,j}u_iv_j}{U(G_0|q)} = \frac{\frac{1}{|G_1|}\sum_{i\in G_1}\sum_{j=1}^NP_{i,j}u_iv_j}{U(G_1|q)} \\ \Longleftrightarrow \sum_{i=1}^N\sum_{j=1}^N(\frac{\mathbb{1_{d_i\in G_0}}}{|G_0|U(G_0|q)} - \frac{\mathbb{1_{d_i\in G_1}}}{|G_1|U(G_1|q)})u_iP_{i,j}v_j = 0 \\ \Longleftrightarrow f^TPv = 0\,(with f_i = (\frac{\mathbb{1_{d_i\in G_0}}}{|G_0|U(G_0|q)} - \frac{\mathbb{1_{d_i\in G_1}}}{|G_1|U(G_1|q)})u_i).

Similarly, to measure the extent to which the disparate impact constraint is violated, DIR (Disparate Impact Ratio) is defined as:

DIR(G0,G1P,q)=CTR(G0P)/U(G0q)CTR(G1P)/U(G1q).DIR(G_0, G_1|P,q) = \frac{CTR(G_0|P)/U(G_0|q)}{CTR(G_1|P)/U(G_1|q)}.

Conclusion

In this paper, the authors developed a generic probabilistic ranking framework for applying different fairness constraints. The defined three constraints are very clear and clever, which reflects the authors’ solid mathematical skills.

In this blog article, all main points and logics are summarised, I recommended to read the original paper for the specific details. You are also welcome to discuss here. : D

References


Written by Jason Li

Jason is currently a PhD Candidate in Computer Science at RMIT University.

Buy me a coffee


This page is open source. Noticed a typo? Or something unclear?
Improve this page on GitHub