The Cornell project is creating tools for research on social and information networks based on a largely untapped dataset: the Internet Archive's 40-billion page collection of Web pages. These snapshots of the Web have been captured and archived every two months for nearly ten years. The project will eventually make very large portions of this massive collection widely accessible for social science research. The flood of available on-line information – from corporate web pages to news groups and blogs – has the potential to open up new frontiers in research on collective behavior of individuals and the diffusion of innovation, as well as practical applications for business and government, such as tracking market trends, the rise and fall of demand, and the spread of consumer preferences. Community watchdog groups will be able to track the spread of 'hate sites' and government agencies will be able to trace past and current uses of the Web for organizing and coordinating terrorist attacks. The development of these tools requires the application of advanced research in natural language processing and machine learning algorithms. The project team also includes computer scientists with expertise in the privacy-preserving analysis of data – a basic challenge in making on-line data more readily accessible for research and policy applications in the social and information sciences.