Fix a finite set the space of user reports. A dataset is an element of , namely a tuple consisting of elements of . Let hist be the histogram of : for any the th component of hist is the number of occurrences of in the dataset We will consider datasets to be equivalent if they have the same histogram (i.e., the ordering of the elements does not matter). For a multiset whose elements are in we will also write hist to denote the histogram of (so that the th component is the number of copies of in ).
Let and consider a dataset For an element let be the frequency of in namely the fraction of elements of that are equal to . Two datasets are said to be neighboring if they differ in a single element meaning that we can write (up to equivalence) and In this case, we write Let be a set; we now define the differential privacy of a randomized function as follows.
Definition (Differential privacy (DP) ): A randomized algorithm is if for every pair of neighboring datasets and for every set we have
where the probabilities are taken over the randomness in Here, and If then we use -DP for brevity and informally refer to it as pure-DP ; if we refer to it as approximate-DP.