- Keynote: Jiawei Han
- Keynote: Amr El Abbadi
- Keynote: Yufei Tao
- Keynote: Wen-Syan Li
- Keynote: Ricky Sun
- Industry Keynote: Shin-Ming Liu
- Industry Keynote: Junhua Zhu
Waim2014 Keynote Talk: Construction and Mining of Heterogeneous Information Networks: Will This Be a Key to Web-Aged Information Management and Mining?
Abstract: For Web-aged information management, massive amounts of data are unstructured, noisy, untrustworthy, but are interconnected, forming gigantic, interconnected information networks. By structuring such unstructured data into multiple types, such networks become semi-structured heterogeneous information networks. Most real world applications that handle big data, including interconnected social media and social networks, medical information systems, online e-commerce systems, or Web-based database systems, can be structured into typed, heterogeneous social and information networks. For example, in a medical care network, objects of multiple types, such as patients, doctors, diseases, medication, and links such as visits, diagnosis, and treatments are intertwined together, providing rich information and forming heterogeneous information networks. Effective analysis of large-scale heterogeneous information networks poses an interesting but critical challenge.
In this talk, we present a set of scenarios for construction and mining of heterogeneous information networks. We show that relatively structured heterogeneous information networks can be constructed from unstructured, interconnected data, and such relatively structured, heterogeneous networks brings tremendous benefits for data management and data mining. Departing from many existing network models that view data as homogeneous graphs or networks, the semi-structured heterogeneous information network model leverages the rich semantics of typed nodes and links in a network and can uncover surprisingly rich knowledge from interconnected data. This heterogeneous network modeling will lead to the discovery of a set of new principles and methodologies for mining interconnected data. The theme to be covered will include (1) construction of heterogeneous information networks from unstructured data, (2) search and recommendation using meta paths, (2) mining heterogeneous information networks: clustering, ranking, classification, and relationship prediction. We will also point out some promising research directions and provide convincing arguments on that construction and mining of heterogeneous information networks could be a key to Web-aged information management and mining.
Summer School Talk: Construction, Exploration and Mining of Semi-Structured, Heterogeneous Information Networks
Abstract: People and informational objects are interconnected, forming gigantic, interconnected, integrated information networks. By structuring these data objects into multiple types, such networks become semi-structured heterogeneous information networks. Most real world applications that handle big data, including interconnected social media and social networks, medical information systems, online e-commerce systems, or database systems, can be structured into typed, semi-structured, heterogeneous information networks. For example, in a medical care network, objects of multiple types, such as patients, doctors, diseases, medication, and links such as visits, diagnosis, and treatments are intertwined together, providing rich information and forming heterogeneous information networks. Effective construction, exploration and analysis of large-scale heterogeneous information networks poses an interesting but critical challenge.
In this talk, we first present a set of data mining scenarios in heterogeneous social and information networks and show that mining typed, heterogeneous networks is a new and promising research frontier in data mining research. Departing from many existing network models that view data as homogeneous graphs or networks, the semi-structured heterogeneous information network model leverages the rich semantics of typed nodes and links in a network and can uncover surprisingly rich knowledge from interconnected data. This heterogeneous network modeling will lead to the discovery of a set of new principles and methodologies for mining and exploring interconnected data, such as rank-based clustering and classification, meta path-based similarity search, and meta path-based link/relationship prediction. Then we discuss our recent progress on construction of quality semi-structured heterogeneous information networks from unstructured data. We will also point out some promising research directions in this domain.
Short Bio. Jiawei Han, Abel Bliss Professor of Computer Science, University of Illinois at Urbana-Champaign. He has been researching into data mining, information network analysis, database systems, and data warehousing, with over 600 journal and conference publications. He has chaired or served on many program committees of international conferences, including PC co-chair for KDD, SDM, and ICDM conferences, and Americas Coordinator for VLDB conferences. He also served as the founding Editor-In-Chief of ACM Transactions on Knowledge Discovery from Data and as the Director of Information Network Academic Research Center supported by U.S. Army Research Lab. He is a Fellow of ACM and Fellow of IEEE, and received 2004 ACM SIGKDD Innovations Award, 2005 IEEE Computer Society Technical Achievement Award, 2009 IEEE Computer Society Wallace McDowell Award, and 2011 Daniel C. Drucker Eminent Faculty Award at UIUC. His book “Data Mining: Concepts and Techniques” has been used popularly as a textbook worldwide.
Waim2014 Keynote Talk: Big Data Challenges: From Scalable Fault-tolerant Data Management to On-Line Social Media Applications.
Abstract: Data is everywhere, and is being used in all sorts of ways to derive valuable information. Data is being generated, stored, managed and analyzed all around us at unprecedented rates. This data has huge VOLUME, arrives at a very high VELOCITY and comes in a VARIETY of different forms and shapes. In fact, big data applications have become indispensable in as diverse fields as physics, political science, commerce, social science and geographic applications. The 3 “V”s (Volume, Velocity and Variety) have caused fundamental challenges to the ways traditional data management systems are designed and implemented as well as how data is consumed and analyzed. They have also given rise to the era of large globally dispersed data centers. In this talk, we will provide an overview of some of the basic principles of Big Data, and will explore the volume challenges they pose to traditional data management systems. We will then discuss some of the velocity research challenges that big data present in large Social media, especially to understand, manage and analyze the diffusion of information in diverse settings.
Summer School Talk: The Distributed and Database Foundations of Cloud-based Data Management
Abstract: Over the past few decades, database and distributed systems researchers have made significant advances in the development of protocols and techniques to provide data management solutions that carefully balance three major requirements when dealing with critical data: high availability, fault-tolerance, and data consistency. However, over the past few years the data requirements, in terms of data availability and system scalability for Internet scale enterprises that provide services and cater to millions of users, have been unprecedented. Cloud computing has emerged as an extremely successful paradigm for deploying Internet and Web-based applications. Scalability, elasticity, pay-per-use pricing, and autonomic control of large-scale operations are the major reasons for the successful widespread adoption of cloud infrastructures. In this seminar, we will first discuss some of the critical distributed systems and database protocols that are essential for understanding current large scale data management. We analyze the design choices that allowed modern NoSQL data management systems (key-value stores) to achieve orders of magnitude higher levels of scalability compared to traditional databases, and lay the foundations for the integration of consistent transactional semantics for data management in the Cloud.
Short Bio. Amr El Abbadi is a Professor of Computer Science at the University of California, Santa Barbara. He received his B. Eng. from Alexandria University, Egypt, and his Ph.D. from Cornell University. Prof. El Abbadi is an ACM Fellow, AAAS Fellow, and IEEE Fellow. He was Chair of the Computer Science Department at UCSB from 2007 to 2011. He has served as a journal editor for several database journals, including, currently, The VLDB Journal, IEEE Transactions on Computers and The Computer Journal. He has been Program Chair for multiple database and distributed systems conferences, most recently SIGSPATIAL GIS 2010, ACM Symposium on Cloud Computing (SoCC) 2011, COMAD (India) 2012 and the first ACM Conference on Social Networks (COSN)2013. He currently serves on the executive committee of the IEEE Technical Committee on Data Engineering (TCDE) and was a board member of the VLDB Endowment from 2002 to 2008. In 2007, Prof. El Abbadi received the UCSB Senate Outstanding Mentorship Award for his excellence in mentoring graduate students. In 2013, his student, Sudipto Das received the SIGMOD Jim Gray Doctoral Dissertation Award. He has published over 300 articles in databases and distributed systems and has supervised over 30 PhD students.
Waim2014 Keynote Talk : Query Sampling: Rejuvenated
Abstract: Query sampling — a classic technique in database systems — aims to report only a sample set of all the objects qualifying a query condition. In this talk, we will re-visit this technique in the big data context, and endow it with a new feature: independence, namely, the sample set returned for a query should be independent from the sample sets of all the previous queries. We will discuss new data structures to support this form of query sampling with excellent theoretical performance guarantees.
Summer School Talk: Skylines: Forgetting Heuristics
Abstract: The skyline operator, since its introduction to the database community, has been thoroughly studied. However, the database community is starting to be bored by a plethora of existing heuristic algorithms claimed to be effective on “real data”, but lacking rigorous theoretical performance guarantees. In this lecture, we will focus on state-of-the-art algorithms solving this problem with excellent bounds on I/O efficiency. As a side product, the lecture will also unveil many of the “magics” behind provably efficient I/O algorithms, such that students may even be able to start designing such algorithms in their own research right away.
Short Bio.Yufei Tao is a full professor in the Department of Computer Science and Engineering, Chinese University of Hong Kong (CUHK). Before joining CUHK in 2006, he was a Visiting Scientist at the Carnegie Mellon University during 2002-2003, and an Assistant Professor at the City University of Hong Kong during 2003-2006. From 2011 to 2013, he was simultaneously a Visiting Professor, under the World Class University program of the Korean government, in the Division of Web Science and Technology, Korea Advanced Institute of Science and Technology (KAIST), Korea. He obtained his PhD degree from Hong Kong University of Science and Technology in 2002, proudly under the supervision of Prof. Dimitris Papadias. He received the best paper award at SIGMOD 2013, and a Hong Kong Young Scientist Award in 2002. He is an associate editor of ACM Transactions on Database Systems (TODS), and of IEEE Transactions on Knowledge and Data Engineering (TKDE). He was a PC co-chair of International Conference on Data Engineering (ICDE) 2014, and a PC co-chair of International Symposium on Spatial and Temporal Databases (SSTD) 2011. He has worked extensively on indexing and query optimization in spatial and temporal databases. His current research aims to leverage the theory of data structures and computational geometry to develop practical algorithms with non-trivial theoretical guarantees, particularly in dealing with massive datasets that do not fit in memory.
Waim2014 Keynote Talk: The New Role of DBMS in Enterprise Application Development
Abstract: Historically speaking, the DBMS was widely adopted in the development of traditional enterprise applications. Yet, in order to maximize cross-platform support & compatibility, most application logics were designed in the middle layer with a general data access adapter, so the use of DBMS was limited to the CRUD access. Nowadays, the role of DMBS has undergone a brand new shift. In the era of Big Data, more and more calculations & analyses are required to be undertaken by the DBMS for extreme performance. Hence, the DBMS has been transformed from a pure data repository into a new computation platform. Meanwhile, to avoid mass data movement, traditional three-tier architecture has been simplified into two-tier via new programming model.
To accommodate these trends, SAP HANA, a real-time data platform with revolutionary in-memory & columnar table characteristics, is introduced. This breakthrough data & application platform brings the calculation logic close to data, combines parallel computing with algorithm optimization framework, enhances both existing & new applications with extreme performance, and makes previously impossible business scenarios possible. We will also discuss several design topics for the data platform during this transition, e.g. the comparison between general-purpose platform and specialized platform; how to design the online/nearline/offline storage and how to achieve balance in the single/dual stack architecture. In this lecture, we will have in-depth and detailed discussion on the new role of DBMS in enterprise application development. Also we will provide a case study of SAP HANA for extreme applications. In the end, several fancy innovation project demos will be shown.
Short Bio: Wen-Syan Li is a Vice President of SAP and the head of SAP Design & New Applications in China. He is responsible for building predictive analytics capability and its applications on HANA and strategic projects as well as supporting HANA’s eco-system, startup program, and customer adoption, and cloud infrastructures in China. He received his Ph.D. in Computer Science from Northwestern University (USA). He also holds an MBA degree in Finance. His interests include databases, in-memory computing, data mining, optimization/scheduling, and developing novel extreme applications. Before joining SAP, he was with IBM Almaden Research Center (USA). Dr. Li has published more than 100 journal articles and conference papers in various areas as well as holds 60+ granted/pending US patents.
Waim2014 Keynote Talk: A New Approach to Big Data
Abstract: Because of the 3-V characteristics of Big Data (volume, variety and velocity), traditional architectures and approaches are not suitable for Big Data. In addition, many organizations find it difficult to determine how and where Big Data can help them transform their business. How do they identify the business areas where Big Data can make the most impact? What data sources do they need? How do they ensure they design architectures that are compatible with their existing environment? These challenges require a new approach, which consists of: a scale-out and flexible storage and compute platform, an analytics environment that can support all data types, a collaborative environment for the data science team, a way to develop Big Data applications quickly, and trusted advisors and experienced architects. This talk covers a comprehensive Big Data solution that includes scale-out storage, a unified analytics platform, a business process-modelling tool and application development services. These technologies, along with data science, consulting and education services, enable organizations to use their Big Data to achieve new levels of efficiency, agility, and innovation.
Short Bio: CTO of EMC China R&D Center, Executive Director of EMC Labs China & Office of CTO, responsible for China R&D Center’s product and solution strategies and roadmaps, Technical Committee, OCTO’s core SDN project and Open Innovation Initiatives, as well as EMC’s China university funding programs. Ricky joined EMC in 2012.Before joining EMC, Ricky works for Microsoft ARD‘s China Innovation Group where, in his PM role, he drives cross-function and cross-team innovation on mobile Internet, cloud computing, big-data & IOT projects. Prior to Microsoft, Ricky spent 13 years in San Francisco Bay Area, Ricky worked for Yahoo! as an Architect and PM on its Portal & Ads BI products, and 3 startups spanning across Network Security, Network Management, Web2.0 and Fast-boot OS. Ricky graduated from Tsinghua University in 97, BS majored in CS and SCU with distinction in 99, MS majored in CE.
Industry Keynote Talks
Director, IOT Lab of Intel Labs China
Industry Keynote Talk: End-to-End Internet of Things Solutions and Architecture
Abstract: This talk covers research studies conducted in the China-Intel IoT Joint Labs. Our research objective is to solve problems with attributes specific to China. Using vertically integrated IoT solutions to explore interaction issues across the sensors/actuators/edge servers in the front end, the connectivity components, the data storage / data analytics engines in the backend, and the application interface for software solution/smart services. We would like to deliver the reference design for a few IoT solutions.
Short Bio: Shin-Ming Liu leads IOT Lab of Intel Labs China. He is also an Associate Lab Director of China-Intel IOT Joint Lab. Before joining Intel, he managed the kernel development team of HP Unified Storage OS and HP-UX Compiler and Performance Tool Lab. He also worked on the Google Search “Giga Indexer” project during its start-up phase, three generations of high performance compiler in MIPS/SGI. Shin-Ming holds 11 US Patents and Master Degree of Computer Science.
Industry Keynote Talk: End-to-End Internet of Things Solutions and Architecture
Abstract: Currently the focus of big data research shifts from infrastructure to value. How to obtain insights promptly and efficiently from massive data becomes a critical issue for next generation big data processing platforms. In order to extract information in real time, frontend OLTP system and backend OLAP system must be fused together. Meanwhile, new emerging hardware may disrupt current conventional computer architecture, and hence the database architecture should be redesigned to meet this trend. In this talk, we will explore how to utilize modern hardware to build an OLTP/OLAP converged system for enterprise data management and analytics with the software-hardware co-design approach.
Short Bio: Dr. Junhua Zhu is the Chief Architect of CloudDB project in Shannon Lab, Huawei Central Research Institute. In that role, his main responsibility is leading a research team to design and build a large-scale real-time data analytic platform targeted for telecommunication scenarios with modern hardware, e.g., NVM, Many-Core processors.