Extending High-Utility Pattern Mining with Facets and Advanced Utility Functions (Extended Abstract)
1. the 37th International Conference on
Logic Programming
Sept. 20–27, 2021 (virtual event)
Extending High-Utility Pattern Mining with Facets and Advanced
Utility Functions (Extended Abstract)
Francesco Cauteruccio and Giorgio Terracina
DEMACS, University of Calabria, Italy
{cauteruccio, terracina}@mat.unical.it
2. Context and Motivation
• Pattern Mining is one of the most studied data mining branches:
• Find interesting patterns (set of items) in a database of transactions;
• Examples: Frequent pattern mining, sequential pattern mining, etc…
• High-Utility Pattern Mining (HUPM)
• Find patterns having a high-utility (w.r.t. some utility measure)
• Example: in a sales database, the utility of a pattern may be represented by the profit of items sold together.
• Basic assumption: each item is associated with one, static utility .
• However…
• The utility of an item can be defined from very different point of views,
• Transactions are not only flat lists of items but they can provide different level of abstractions.
Pattern Mining and High-Utility Pattern Mining
3. Context and Motivation
• We present a framework for HUPM extending basic notions and introducing:
• Given a pattern 𝑃, we say that 𝑃 is an extended high-utility pattern if its utility 𝑢(𝑃) is greater than a minimum
threshold 𝑡ℎ! and it occurs in at least 𝑡ℎ" transactions.
• The problem of extended high-utility pattern mining (e-HUPM) is to discover all the extended high-utility
patterns in a given database 𝐷.
An extended framework for HUPM
Transaction set representation
Facets
Advanced utility functions
For each transaction, different levels of
aggregation can be defined.
+
+
Attributes that can be associated with
any level of aggregation of the
framework.
A taxonomy of functions that can be
combined in several ways to fit
different notions of utility.
4. Proposed Framework
• A multi-layer database representation
• 𝐷𝑎𝑡𝑎𝑏𝑎𝑠𝑒 → 𝐶𝑜𝑛𝑡𝑎𝑖𝑛𝑒𝑟 → 𝑂𝑏𝑗𝑒𝑐𝑡 → 𝑇𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛
• Given a database 𝐷 and a set of transactions {𝑇!, … , 𝑇"}
• 𝐷 is organized as a set of containers 𝐶 = {𝐶!, … , 𝐶"}
• 𝐶# can be associated with a set of objects 𝑂 = {𝑂!, … , 𝑂$}
• 𝑂% contains a set of transactions {𝑇!, … , 𝑇&}
• The facets define the utility of an item 𝑖
• Each item 𝑖 can be associated with one or more facets.
• Facets can also be defined for transactions, objects and containers.
• Values of facets are represented by utility vectors
• Item utility vector: 𝐼𝑈! = [𝑖𝑢", 𝑖𝑢#, … , 𝑖𝑢$], and 𝑖𝑢% describes a certain facet of 𝑖
• Same applies for transactions, objects and containers.
Extending the database structure and introducing facets
Running example depicting a sales database
Item utility vector for the item 𝑖! containing
values for price and weight facets
5. Proposed Framework
• The advanced utility functions combine utility vectors in different ways:
1. Intra-pattern utility function
• Given a pattern and its occurrences, combine item utility vectors across all the occurrences.
2. Pattern utility function
• Combine the intra-pattern utility function with the facets from the transaction set representation.
• Generates a matrix 𝑈& (occurrences × facets).
3. Utility function 𝒖 𝑷
• Computes the utility of a pattern over 𝑈&.
• Several classifications for 𝑢(𝑃)
• Horizontal-first, vertical-first, inter-transaction utility, pattern-vs-object utility, etc…
Advanced utility functions
6. ASP Approach
The encoding
%%% Input schema:
%container(ContainerId)
%object(ObjectId,ContainerId)
%transaction(Tid, ObjectId)
%item(Item, Tid, Position, Q)
%itemUtilityVector(Item, I1, ..., Il)
%transactionUtilityVector(Tid, T1, ..., Tm)
%objectUtilityVector(ObjectId, O1, ..., On)
%containerUtilityVector(ContainerId, C1, ..., Co)
%%% Parameters
occurrencesThreshold(...). utilityTreshold(...).
%%% Item pre-filtering
usefulItem(I):- item(I,_,_,_),....any condition on the items.
%%% Candidate pattern generation
{inCandidatePattern(I)}:- usefulItem(I).
%%% Occurrences computation and check
inTransaction(Tid):- transaction(Tid,_), not incomplete(Tid).
incomplete(TiD):- transaction(Tid,_), inCandidatePattern(I), not contains(I,Tid).
contains(I,Tid):- item(I,Tid,_,_).
:- #count{ Tid : inTransaction(Tid)}=N, N < Tho, occurrencesThreshold(Tho).
%%% Utility computation
patternItemUtilityVectors(Tid,Item,I1,...,Il,Q):- inCandidatePattern(Item),
itemUtilityVector(Item, I1, ..., Il), inTransaction(Tid), item(Item, Tid, Position, Q).
intraPatternUtilityVector(Tid,I1,...,Il):-
&computeIntraPatternUtility[patternItemUtilityVectors](Tid,I1,...,Il).
occurrenceUtilityVector(Tid,I1,...,Il,T1,...Tm,O1,...On,C1,...,Co):-
inTransaction(Tid), intraPatternUtilityVector(Tid,I1,...,Il), transactionUtilityVector(Tid, T1, ..., Tm),
transaction(Tid, ObjectId), objectUtilityVector(ObjectId, O1, ..., On), object(ObjectId , ContainerId)
containerUtilityVector(ContainerId, C1, ..., Co).
:- &computeUtility[occurrenceUtilityVector](U), U < Thu, utilityTreshold(Thu).
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
• Classic guess-and-check approach
• We generate one answer set for each
valid pattern,
• The advanced utility functions are
executed by means of external
functions (e.g., in DLVHEX, WASP,
clingo)
• Pattern validity criteria and filters can
be applied by encoding them.
7. Experimental Evaluation
• Dataset: aspect-based sentiment analysis of
scientific reviews [1]
• Each sentence is annotated with different
aspects, each aspect as a value.
Dataset, quantitative and qualitative analysis
[1] Chakraborty, S., Goyal, P., Mukherjee, A.: Aspect-based sentiment analysis of scientific reviews. In: JCDL ’20: Proceedings of
the ACM/IEEE Joint Conference on Digital Libraries in 2020, Virtual Event, China, August 1-5, 2020. pp. 207–216 (2020), ACM
• Quantitative analysis: Running time, comparison with
HUPM systems (in a classical setting).
• Qualitative analysis:
• Mining patterns where the advanced utility function is the
Pearson correlation between the sentiment on a sentence
aspect 𝑋 and the final decision on the corresponding paper.
• Similar results have been obtained where the advanced
utility function is the Multiple correlation.
The use case scenario in the e-HUPM model
The use case scenario in the e-HUPM model
8. Conclusion
• We introduced a general framework for HUPM with several extensions.
• The framework allows to work with multi-dimensional data and with different utility measures.
• A versatile and modular ASP encoding has been developed.
• We employed a real use case on scientific reviews to carry out both quantitative and qualitative
analyses.
• Facets and advanced utility functions help reducing the amount of relevant patterns
• Useful in providing deep insights on the data.
• Not an ending point!
• Apply the framework to new contexts,
• Derive ad-hoc algorithms for particular classes of advanced utility functions.
9. Thanks for your attention!
These slides are available at https://bit.ly/ehupm-iclp21
Francesco Cauteruccio
Research Fellow @ DEMACS, University of Calabria
cauteruccio@mat.unical.it
francescocauteruccio.info
@finalfire