Example of running HHUIF algorithm

This example explains how to run HHUIF algorithm using PPSF open-source privacy-preserving library.

How to run this example?

  • If you are using the graphical interface, (1) choose the "HHUIF" algorithm, (2) select the corresponding input database file and utility table file  , (3) set the output file name (e.g. "output_HHUIF.txt") (4) set minutil to 0.225 , (5) set sensitive percentage to 0.15 and (6) click "Run algorithm".
  • If you are using the source code version of SPMF, launch the file "main.java" in the package "PPSF/src/gui", the next steps are the same as above.

What is the input of HHUIF?

The input is a transaction database , a utility table, a minimum utility threshold and a sensitive percentage threshold.

A transaction database is a set of transactions. Each transaction is a set of items and their quantities. For example, consider the following transaction database.The character before ":" is the name of the item. The number behind ":" is the quantity of the item. It is important to note that an item is not allowed to appear twice in the same transaction and that items are assumed to be sorted by lexicographical order in a transaction.

Transaction id Items and their quantitiess
t1 1:4    3:3    9:5 ...
t2 2:2    3:3    9:3 ...
t3 2:2    4:2    9:1 ...
t4 1:1    3:3    10:2 ...
t5 2:3    3:3    9:4 ...

A utility table lists the utility of each item. An example is given below.The character in the left column of the table is the names of the items.The number in the right column of the table is the utility of the items.

Items Utility
0 57
1 76
2 151
3 35
4 41
5 118
6 74
7 44
8 225
9 219
10 200
... ...

What is the output of the algorithm?

The output  is a sanitised transaction database after deletion operation. It consists of a set of tansactions and their quantities that their utility are lower than the safe threshold. The following table is the output after running the HHUIF algorithm using "mushroom_UM_New.txt" as input database and "mushroom_UM_UtilityTable.txt" as utility table. The minimum utility threshold is set to 0.1 and the sensitive percentage is set to 0.15.

Transaction id Items and their quantities
t1 1:4    3:3    9:5   ...
t2 2:2     3:3    9:3   ...
t3 55:4    99:5    23:2   ...
t4 23:3    34:4    67:4   ...
t5 99:2    34:4    67:5   ...

Input file format

The input database file format for HHUIF is defined as follows. It is a text file. Itemsand theirs quantities are both represented by  positive integers. A transaction is a line in the text file. In each line (transaction), items and their quantities are separated by a colon.Items are separated by a single space.

For example, for the previous example,  the input format is defined as follows:

1:4 3:3 9:5 ...

2:2 3:3 9:3 ...

2:2 4:2 9:1 ...

1:1 3:3 10:2 ...

2:3 3:3 9:4 ...

The input utility table file format for HHUIF is defined as follows. It is also a text file. Each line represents an item an its utility value. An item and its utility is separated by a comma and a single space. Take the previous data for example, the input utility table file format is defined as follows:

0, 57

1, 76

2, 151

3, 35

4, 41

5, 118

6, 74

7, 44

8, 225

9, 219

10, 200

...

Output file format

The output file format is defined the same as input file. Take the previous file for example, the output file format is defined as follows:

1:4 3:3 9:5 ...

2:2 3:3 9:3 ...

55:4 99:5 23:2 ...

23:3 34:4 67:4 ...

99:2 34:4 67:5 ...