cover_image

Differential Privacy

Kurt Pan XPTY
2021年01月07日 13:27

Fix a finite set  the space of user reports. A dataset  is an element of , namely a tuple consisting of elements of . Let hist  be the histogram of  : for any  the  th component of hist  is the number of occurrences of  in the dataset  We will consider datasets  to be equivalent if they have the same histogram (i.e., the ordering of the elements  does not matter). For a multiset  whose elements are in  we will also write hist  to denote the histogram of  (so that the  th component is the number of copies of  in  ).

Let  and consider a dataset  For an element  let  be the frequency of  in  namely the fraction of elements of  that are equal to . Two datasets  are said to be neighboring if they differ in a single element meaning that we can write (up to equivalence)  and  In this case, we write  Let  be a set; we now define the differential privacy of a randomized function  as follows.

Definition (Differential privacy (DP) ): A randomized algorithm  is  if for every pair of neighboring datasets  and for every set  we have

where the probabilities are taken over the randomness in  Here,  and  If  then we use  -DP for brevity and informally refer to it as pure-DP ; if  we refer to it as approximate-DP.