Re: How to generate transaction dataset from program source code?
Date: October 14, 2011 03:04AM
Hello Xin,
I think that you can just assign a unique number for each function names.
For example if you have some code like this:
void functionA(){
int a, b = 5;
int c = 6;
functionA(a);
functionB()
functionC()
functionD()
}
You could convert this to a transaction like this:
1 2 3 4
where : 1 = functionA, 2 = functionB, 3 = functionC, 4 = functionD
You can create the mapping between numbers and functions dynamically. The number is not important. What is important is that you use unique number for each function name.
For sequences, you could do something similar. For example, consider this function:
void functionA(){
int a, b = 5;
int c = 6;
functionA(a);
functionB()
functionC()
functionD()
functionA()
functionC()
}
This could be translated as a sequence such as:
1 , 2 , 3, 4, 1, 3
where 1 = functionA, 2= functionB, 3 = functionC, 4= functionD
For sequences, you also use the basic blocks of function to group some items together. For example, consider this function:
void functionA(){
int a, b = 5;
int c = 6;
for(....) {
functionA(a);
functionB()
}
functionC()
functionD()
while(...){
functionA()
functionC()
}
}
This could be translated as a sequence such as:
(1 2 ), 3, 4, (1 3)
where 1 = functionA, 2= functionB, 3 = functionC, 4= functionD
I just write it like this to give you some examples. But if you use the SPMF source code on my website for sequential pattern mining, the input format for sequences would be:
1 2 -1 3 -1 4 -1 1 3 -1 -2
Hope this helps,
Phil