Summer 2012 Week 3: Big Data and Long Tails: Addressing the Cyber-Infrastructure Challenges for Research on a Budget
July 28, 2012 to August 4, 2012
Canyons Resort, 4000 Canyons Resort Drive, Park City, Utah
- Christine Borgman (University of California Los Angeles)
- Ian Foster (Argonne National Laboratory/University of Chicago)
- Bryan Heidorn (University of Arizona)
- Bill Howe (University of Washington)
- Carl Kesselman (University of Southern California/Information Sciences Institute)
Decade-long big-science projects such as the human genome project, large hadron collider, LIGO gravitational wave observatory, and earth observation system have created datasets of unprecedented size that seem likely to revolutionize entire fields of science. Due to advances in sensors, computation and storage, the cost and effort required to produce of datasets of comparable scale can, in principle at least, is significantly reduced. As a result, we are seeing a proliferation of large-scale data sets, assembled in dozens of different fields spanning the physical sciences and engineering, medicine and social sciences as well. The scientific opportunities inherent in this “big data” revolution are enormous. But given finite resources, we now face the challenge of exploiting these opportunities at a budget level per project dramatically lower than for the big science projects that pioneered advanced cyberinfrastructure and big data methods. Equally challenging is the need to promulgate new big data methods to communities that lack the expertise and resources possessed by big science projects. The “long tail” of science has data challenges even greater than those of “big science.” The large number of small projects and collaborations in the long tail produce ever larger volumes of data, yet lack the large shared instruments, data repositories, community standards for data structures and metadata, and critical mass of data management expertise.
Fortunately, concurrent with these trends, there has also been significant advances in the commercial computing environment in that acquisition and analysis of extremely large scale data-sets driven by ecommerce, social networking and the Web has become commonplace as have the tools and infrastructure for such a commodity. 15 years ago, Jim Gray significantly altered the scientific infrastructure landscape by asserting and then proving that relational database technology, the workhorse of traditional enterprise systems had significant unrecognized value to the scientific community. Subsequently, the use of relational databases has become prevalent in research infrastructure, often in lieu of large, special-purpose software development activities. We now appear to be at a similar inflection point, with technologies of search, Internet commerce, such as Hadoop-enabled scalable data servers, large-scale data analytics, software as a service browser-based applications hosted on commodity clouds, and the semantic web have the potential to significantly alter the way data is captured, analyzed, and shared in scientific investigations.
Much as Jim Gray did, we are at a point where it will beneficial to assess the impact of these new technologies, to understand the big data and long-tail requirements of a range of scientific communities with the goal of understanding how these common tools and infrastructure apply to scientific data processes and in the process, putting big data in the hands of a broader community of scientists. There are many potential issues that may get in the way: data volumes are larger: orders of magnitude bigger in many cases. Budgets are often smaller. Uses are more idiosyncratic. Small research teams will have limited information technology and computer science expertise. These factors all make the long tail problem in many ways more difficult than the issues facing big science projects.
Our goals of this workshop are to characterize, and where possible quantify, the needs of diverse scientific communities for “big data” technologies; explore existing and new methods for meeting those needs in ways that can scale to large numbers of people (whether working alone, in small teams, or in larger aggregations), and large, diverse, distributed data; and to identify foundational elements of a big data/long tail ecosystem that may accelerate progress towards meeting those needs. In addition to considering technologies, we will examine structural barriers to the effective use of big data, such as data sharing habits and skills gaps, and means of overcoming those structural barriers. The output from the workshop will be a position paper that will identify the major challenges that we have identified and make recommendations as to how these challenges might be addressed.
The meeting will be organized as a series of “mini-workshops” which will focus on topics spanning specific communities of use, technology approaches and social, structural and organizational issues. We are recruiting a set of topic experts to lead these “mini-workshops”.
On-site ICiS staff member: TBD
Please plan to arrive on Saturday, July 28. The program will begin on Sunday morning, July 29, with a kick-off event and close on Saturday, August 4, by noon. Please check back for the agenda.
The Canyons resort is a fully staffed, full service resort, which provides complimentary high-speed wireless Internet in all guest rooms, public areas, and function spaces. Other amenities include: use of lodge fitness center, heated outdoor pool and hot tub, and free underground heated parking.
ICiS sessions and lodging will be in the Silverado Lodge:
There are several restaurants located within walking distance: http://www.canyonsresort.com/dining.html
If you would like information on restaurants in Park City area, the front desk will have a more complete list at check-in.
Summer Activities at the Canyons
Summer Camps and Child Care
Summer camps for children 6 to 12 years of age Monday – Friday are available at the Canyons. Child Care for children 6 weeks to 6 years is also available Monday – Friday. Canyons Little Adventures Childcare Center is a state-licensed childcare facility. Visit the link below for more information on the programs offered.
Shuttle Service (All Resort Express Airport Shuttle)
All Resort Shuttle departs from Salt Lake City Airport every 30-40 min. The vans seats up to nine passengers, and may make up to four stops at various destinations.
To be more ECO friendly ICiS has negotiated an all-inclusive $43 rate for travel from and to the Silverado Lodge. All are encouraged to use this service. If you have other parties/family traveling with you, you will need to contact All-Resort to let them know. Please reference group # 8070 if you are making your reservation.
You must make these reservations at least 48 hours prior to your arrival. To make your reservations please use one of the following options:
Office: 435-649-3999 EXT 2 Fax: 435-649-3549 Web Portal: ICiS 2012 Special
MISSED ORIGINATION/CONNECTING FLIGHTS:
In the event of weather delays or missed connections, please call 800-457-9457 so that we may reschedule accordingly.
Individual reservation cancellations must be made 24 hours prior to your original pickup time and are subject to a 20% booking fee. Individual reservation cancellations received within 24 hours of your reservation are non-refundable. Event Shuttle cancellations must be made at least 72 hours prior to the start of the event. Shuttles cancelled within 72 hours of the start time will be charged the full amount contracted.
The following car rental companies have counters in Salt Lake City International Airport: Advantage, Alamo, Avis, Budget, Dollar, Enterprise, National, and Thrifty. Make your reservations via the company website or via a travel engine like Expedia.com, Travelocity.com, or Orbitz.com.
Since ICiS has a contract with All Resort Express Airport Shuttle to transport our attendees, rental vehicles may NOT be covered as part of your reimbursement, please check with firstname.lastname@example.org if you plan to rent a vehicle.
Please note: Park City has a free local transit service.