The Data Mining Forum                             open-source data mining software data mining conferences Data Science for Social and Behavioral Analytics DSSBA 2022 data science journal
IMPORTANT: This is the old Data Mining forum.
I keep it online so that you can read the old messages.

Please post your new messages in the new forum:
KDD CUP 2012 data mining competition
Date: January 13, 2012 02:45PM

The KDD Cup 2012 data mining competition has been announced. (the website)


2/20/2012 Competition announcement linked to KDD official site
3/1/2012 Registration opens (dataset ready for the public)
3/15/2012 Competition begins
6/1/2012 Competition ends (submission deadline)
6/5/2012 Results compiled
6/8/2012 Winners notified
8/12/2012 Workshop

The topic:

"This year's KDDCUP is sponsored by Tencent Inc., which is China's largest Internet company in terms of active users (over 700 Million users as of Jan. 2012). Tencent Inc. owns a full portfolio of popular products including instance messaging, email, and news portal, search engine, online games, blogging and micro-blogging in China, offering a rich opportunity to build user models for highly effective user intent prediction and result recommendation. This year's KDDCUP consists of two separate tasks. "

There will be two tasks:

Task 1. Social Network Mining on Microblogs (Weibo)

Tencent Weibo ( offers a wealth of social-networking information. For the 2012 KDD Cup, the released data represents a sampled snapshot of the Tencent Weibo users' preferences for various items - the recommendation to users and follow-relation history. In addition, items are tied together within a hierarchy. That is, each person, organization or group belongs to specific categories, and a category belongs to higher-level categories. In the competition, both users and items (person, organizations and groups) are represented as anonymous numbers that are made meaningless, so that no identifying information is revealed. The data consists of 10 million users and 50,000 items, with over 300 million recommendation records and about three million social-networking "following" actions. Items are linked together within a defined hierarchy, and the privacy-protected user information is very rich as well. The data has timestamps on user activities.

Task 1 is to predict which users a given user will follow, among all potential users.

Task 2. User Click Modeling based on Search Engine Log Data

Online advertising has been the financial support of the Internet industry for years. Three successful kinds of computational ad systems are search ad, contextual ad and social networking ad systems. Search ads systems retrieve and rank ads given a query, and display result ads together with results from the search engine. Once a user clicks on an ad, the advertiser pays the search engine for its help on promotion. The ranking of ads is to maximize users' satisfaction, advertisers' return-on-investment and search engine's revenue. Contextual ad systems involve an additional role, the publishers, who own Internet properties like Web sites, forums or mobile apps. Programs embedded in these properties request ads from ad systems. The ad system finds ads that semantically match content of the properties. Recently, a third kind of computational ad systems is gaining popularity, including social network ads, gained a lot of attention, where the ad system ranks ads with consideration of social relationship.

In all aforementioned systems, a key algorithmic component is to predict the click-through rate (pCTR) of ads. This is because all such systems optimize monetization under the supervision of economic rules (e.g., General Second Price auction, the one behind Google AdWords and others); and these rules require ads pCTR values to rank ads and to price clicks. The closer the pCTR to the truth, the more effective the monetization would be. The use of user information, including demographics and historical behaviors on search engines, e-business platforms, social networks, and micro-blogs, is likely valuable to improve the accuracy of ads pCTR in all above systems.

Task 2's aim is to accurately predict the ads' click-through rate in online computational ad systems.

See the KDD Cup 2012 webpage for more details:

Edited 3 time(s). Last edit at 01/18/2012 04:16AM by webmasterphilfv.

Options: ReplyQuote

This forum is powered by Phorum and provided by P. Fournier-Viger (© 2012).
Terms of use.