Today, I’m going to share points and conceptions I got from a fantastic paper “Fairness of Exposure in Rankings1” written by Ashudeep Singh and Thorsten Joachims.
In this paper, the authors mainly introduce the fairness constraints based on exposure in rankings. They proposed three forms of fairness constraints, namely demographic parity, disparate treatment and disparate impact constraints to balance fairness to the items being ranked with the utility.
Demographic parity states that protected groups (based on sensitive attributes, e.g. sex, gender, etc.) should have equal opportunities to be shown or recommended. For example, male and female job seekers should have equal opportunities to expose in a job seeker website. It’s reasonable to distribute exposure more evenly, even if sacrificing the utility in rankings.
Disparate treatment is a term refers to a kind of intentional discrimination. For example, only testing the particular skills of female applicants in a job seeker website.
Disparate impact
Unlike disparate treatment, disparate impact is referred to unintentional discrimination, for example, female job seekers with a lower proportion of all job seekers would be exposed less unintentionally.
The framework for fair ranking
As the authors of this paper proposed, the definition of fairness depends on context and application, which means that different fairness constraints should be applied on different scenarios. They developed a general framework for computing utility-maximizing ranking under different fairness constraints. Generally, the problem of optimal ranking under fairness constraints can be formulated as the following optimization problem:
r=rargmaxU(r∣q)
, where U(r∣q) denotes the utility of a ranking r for query q, the ranking r presents the order of a set of documents D={d1,d2,d3…,dN}.
Untility
The utility measures of a ranking system evaluates the utility of the ranking from the relevance of the individual items being ranked. rel(d∣u,q) denotes the relevance score of the document d when a user u search a query q, the relevance could be binary relevance, movie ratings or a real-valued score. In information retrieval systems, the generic way to express utility measures is:
U(r∣q)=u∈U∑P(u∣q)d∈D∑v(rank(d∣r))λ(rel(d∣u,q)),
where U is the set of all users that lead to identical q, v reflects the importance of ranking position and λ is a function mapping the relevance to its utility. Take Discounted Cumulative Gain (DCG) as an example:
is the expected utility of a document d for query q. u(d∣q) can be understood as the probability of relevance if rel(d∣u,q) are binary relevances. Therefore, sorting the documents by u(d∣q) is the effective way to maximize the utility for ranking:
rargmaxU(r∣q)≡d∈Dargsortu(d∣q).
Based on this equation, we can take an example to explain it:
Document ID
Position
Binary Relevance Score
v(rank(d∣r))u(d∣q)
0
1
0
log(1+1)20−1
1
2
1
log(1+2)21−1
2
3
0
log(1+3)20−1
3
4
1
log(1+4)21−1
4
5
1
log(1+5)21−1
5
6
0
log(1+6)20−1
We can easily get the best-ranking order by sorting the v(rank(d∣r))u(d∣q), the ranking order with maximum utility should be {d1,d3,d4,d0,d2,d5}.
Probabilistic Rankings
In this paper, the authors applied probabilistic rankings R instead of a single deterministic ranking r to avoid combinatorial optimization:
, to avoid the exponential computation, the authors compute the utility by the marginal rank distributions3 of the documents.
Specifically, let Pi,j be the probability that R places document di at rank j, then creating a doubly stochastic matrix4 of size N×N using P. The matrix meets:
i∑Pi,j=j∑Pi,j=1
. Therefore, the expected utility for a probabilistic ranking can be computed as:
U(P∣q)=di∈D∑j=1∑NPi,ju(di∣q)v(j),
where u(di∣q) for each user should be a column vector with size of N, and v is also a column vector of size N. Thus, the expected utility can be rewritten as:
U(P∣q)=uTPv.
Optimizing Fair Rankings via Linear Programming
To efficiently compute R for every doubly stochastic matrix P, they formulated the problem of finding the utility-maximizing probabilistic ranking under fairness constraints in terms of doubly stochastic matrices instead of distributions over rankings:
The optimization objective is linear in N2 variables Pi,j (1<=i,j<=N). And the fairness constraints are linear as well. The solution without fairness constrains and for any vj that decreases with j is the permutation matrix5 that ranks the set of documents in decreasing order of utility. Therefore, the problem of finding the utility-maximizing probabilistic ranking and the fairness constraints can be linearly formulated as:
fTPg=h.
Sampling Rankings
The authors employed the Birkhoff-von Neumann(BvN) decomposition6 to compute R from P. Specifically, if A is a doubly stochastic matrix, the decomposition of the form is:
A=θ1A1+θ2A2+...+θnAn
where 0<=θi<=1, and Ai are permutation matrices. Ai corresponses to deterministic rankings of the document set and θi corresponds to the probability of sampling each ranking.
The decomposition can be computed efficiently in polynomial time, according to Marcus-Ree theorem6, there are no more than (N−1)2+1 permutation matrices.
Summary of Algorithm
Set up the utility vector u and position discount vector v, as well as the vectors f and g, and h.
Solve the linear program for P.
Compute the Birkhoff-von Neumann decomposition P=θ1P1+θ2P2+...+θnPn.
Sample permutation matrix Pi with probability proportional to θi and get the corresponding ranking ri.
While setting up the vectors in the first step, u(d∣q) is assumed as true relevances and u^(d∣q) is from some predictive models.
Fairness Constraints
First, to clearly express the fairness, the authors define the exposure for a document di under a probabilistic ranking P as:
Exposure(di∣P)=j=1∑NPi,jvj
The goal is to ensure different groups Gk have the same opportunities of exposure.
Data sets
They employed two data sets, one is the fake synthetic data set which contains 6 job seekers with probabilities of relevance to an employer of u=(0.81,0.80,0.79,0.78,0.77,0.76)T and the other is News recommendation dataset with ratings divided by 5 and added a Gaussian noise of 0.05 (finally clipped to 0 ~ 1).
Demographic Parity Constraints
This constraint enforces the average exposure of the documents in both groups is equal, the average exposure in a group is denoted as:
For the data set of Job-seeker, after solving the linear program without fairness constrains, the below figure shows the optimal rankings in terms of P.
The optimal rankings in terms of P with Demographic Parity constraint is shown as the below figure:
This fair ranking can be decomposed into a mixture of two deterministic permutation matrices with the associated weights:
The results are shown by the below figure for the data set News recommendation dataset:
Disparate Treatment Constraints
This constraint ensure the exposure be proportional to relevance, to end this, the average utility of a group can be denoted as:
U(Gk∣P)=∣Gk∣1di∈Gk∑ui,
the exposure of two groups to be proportional to their average utility:
In this paper, the authors developed a generic probabilistic ranking framework for applying different fairness constraints. The defined three constraints are very clear and clever, which reflects the authors’
solid mathematical skills.
In this blog article, all main points and logics are summarised, I recommended to read the original paper for the specific details. You are also welcome to discuss here. : D