This page presents various performance tests to evaluate the performance of algorithms in PPSF.
Performance comparison:Greedy, PPDM pGA2DT, PPDM sGA2DT
Execution time:
- execution time with different sensitive percentage
- execution time with different min_sup
Side-effect:
- F-T-H
Fail To be Hidden(F-T-H) has been measured to compare how much of the sensitive information has not been successfully hidden by each sanitization algorithm. It is defined as:
where |SIs| is the number of sensitive itemsets in the original database and the |SIs*| is the number of sensitive itemsets still appearaing in the sanitized database.
·F-T-H with different min_sup
· F-T-H with different sensitive percentage
- N-T-H
The side-effect Not To be Hidden(N-T-H) is used to evalute how many non-sensitive frequent itemsets are hidden by the sanitization process. That is:
Where FIs is the set of frequent itemsets discovered in the original database, and SIs is the set of sensitive itemsets. Therefore the term |FIs-SIs| is the number of non-sensitive frequent itemsets in the original database. The notation FIs* denotes the frequent itemsets still appearing in the sanitized database. Therefore the term |FIs- SIs-FIs*| is the number of non-sensitive frequent itemsets that are hidden by the sanitization process.
· N-T-H with different sensitive percentage
· N-T-H with different min_sup
Fitness value:
To further compare these algorithms, the fitness values of solutions have also been compared.
- Fitness value with different sensitive percentage
- Fitness value with different min_sup
DS:
The databse similarity is evaluated to examine the size difference between the original database and the sanitized database. If the difference is smaller for an algorithm than for the other algorithms, it means that the algorithm has selected a better set of transactions for deletion, and thus avoid deleting irrelevant transactions that may result in hiding more non-sensitive frequent itemsets. The DS is defined as:
where |D| is the number of transactions in the original database D and |D*| is the number of transactions in the sanitized database D*.
- DS with different sensitive percentage
- DS with different min_sup
Performance comparison:HHUIF, MSICF, MSU-MAU, MSU-MIU
Execution time:
- Execution time with different sensitive percentage
- Execution time with different database size
Number of modified transactions:
For the purpose of PPDM or PPUM, it is necessary to minimize the number of modified transactions for sanitization. The number of modified transactions is defined as how many transactions have been altered by the sanitization process. If that value is small, it indicates that few transaction have been altered.
- number of modified transactions with different database size
- number of modified transactions with different sensitive percentage
Missing rate of HUIs:
In the sanitization process for hiding SHUIs, only the missin gcost side effect can occur since the proposed algorithms only perform the following operations to hide SHUIs: deleting items or decreasing their quantities. Thus, already discovered HUIs may become invalid during the sanitization process.
- missing rate with different database size
- missing rate with different sensitive percentage
Data Structure Similarity:
Database Structure Similarity (DSS) measure is introduced to evaluate the structural similarity before and after the sanitization process. The DSS only considers whether items or itemsets are present or absent in transactions. Based on this criteria, it assesses the similarity degree between the original database and the sanitized one. The designed
DSS similarity measure is formalized as follows, where
the first term in the parenthesis is the frequency of a pattern in the original database D, and the last term in the parenthesis is the frequency of the pattern in the perturbed database D’.
- DSS with different database size
- DSS with different sensitive percentage
Database Utility Similarity:
The DUS is used to measure the amount of utility loss in the entire dataset. This measure can be used to reveal the amount of utility that was removed (utility loss) by the sanitization process, which is a suitable criterion for PPUM. A higher database utility similarity indicates that less information is lost in the sanitization process. The DUS measure is formally defined as follows:
where tu(Tq) denotes the total utility of transaction Tq.
- DUS with different database size
- DUS with different senseitve percentage
Itemset Utility Similarity:
In addition to the designed DUS criterion to evaluate the performance of the developed algorithms in PPUM, an Itemset Utility Similarity (IUS) measure is also designed as a way of assessing the utility loss for the discovered HUIs before and after the sanitization process of PPUM. This criterion is similar to the missing cost (missing rate of the discovered HUIs) but provides a more realistic assessment for PPUM especially when the gap between the utilities
of the discovered HUIs and the minimum utility threshold is large. The proposed IUS criterion is defined as follows.
where HUIsD and HUIsD′ respectively be the high-utility itemsets (HUIs) found in the original database D and the sanitized database D’.
- IUS with different database size
- IUS with different sensitive percentage