Skip to main content

Custom Crawl Services

Internet Archive

Large-scale web harvests and national domain crawls performed for National Libraries, National Archives, preservation partners, research initiatives, and as part of special projects and custom crawling and research services.



rss RSS

Show sorted alphabetically
Show sorted alphabetically
SHOW DETAILS
up-solid down-solid
eye
Title
Date Archived
Creator
National Library of Australia Crawls
National Library of Australia Crawls
collection
40,023
ITEMS
377.3M
VIEWS
collection
eye 377.3M
Crawls performed by Internet Archive on behalf of the National Library of Australia. This data is currently not publicly accessible.
Internet Archive Research Publication Crawls
Internet Archive Research Publication Crawls
collection
19,726
ITEMS
18.5M
VIEWS
by Internet Archive Web Group
collection
eye 18.5M
A series of open web crawls targeting journal articles, technical memos, essays, datasets, and other research publications. This collection contains WARC and CDX files that end up in Wayback ( https://web.archive.org ). See also bibliographic metadata corpuses at  https://archive.org/details/ia_biblio_metadata
National Library of Spain Crawls
National Library of Spain Crawls
collection
6,742
ITEMS
219.2M
VIEWS
collection
eye 219.2M
Data collected by Internet Archive on behalf of the National Library of Spain. This data is currently not publicly accessible.
Election Crawl 2012
Election Crawl 2012
collection
1,613
ITEMS
135.8M
VIEWS
collection
eye 135.8M
This crawl was performed in Summer & Fall of 2012 to archive the US Federal Elections.
Topics: US, federal, elections, web, 2012
Elections Web
Elections Web
collection
1,614
ITEMS
135.8M
VIEWS
collection
eye 135.8M
This collection contains collaborative Election crawls performed by IA.
Topics: elections, web
Bibliotheque Nationale de France Domain Crawls
Bibliotheque Nationale de France Domain Crawls
collection
1,653
ITEMS
152.3M
VIEWS
collection
eye 152.3M
Crawls of the french domain space performed by Internet Archive on behalf of Bibliotheque Nationale de France. This data is currently not publicly accessible.
National Archives and Records Administration
National Archives and Records Administration
collection
11,218
ITEMS
92M
VIEWS
collection
eye 92M
National Archives and Records Administration crawl performed by Internet Archive. This data is currently not publicly accessible.
National Library of New Zealand Crawls
National Library of New Zealand Crawls
collection
13,060
ITEMS
82.8M
VIEWS
collection
eye 82.8M
Crawls performed by Internet Archive on behalf of the National Library of New Zealand. This data is currently not publicly accessible.
National Library of Australia Crawl
collection
4,658
ITEMS
91.5M
VIEWS
collection
eye 91.5M
National Library of Austrailia crawl. This data is currently not publicly accessible.
bnf_2008
collection
715
ITEMS
75.7M
VIEWS
collection
eye 75.7M
this data is currently not publicly accessible.
National Library of Luxembourg
National Library of Luxembourg
collection
9,151
ITEMS
35M
VIEWS
collection
eye 35M
National Library of Luxembourg
Topic: Luxembourg
nls_2009
collection
874
ITEMS
52.5M
VIEWS
collection
eye 52.5M
this data is currently not publicly accessible.
National Library of Israel
National Library of Israel
collection
3,605
ITEMS
42.9M
VIEWS
by dominic@archive.org
collection
eye 42.9M
Data collected by Internet Archive on behalf of the National Library of Israel.  This data is currently not publicly accessible.
Topic: nlil
Olympics Web
Olympics Web
collection
2,066
ITEMS
55.6M
VIEWS
collection
eye 55.6M
This collection includes all collaborative Olympic crawls performed by IA for the IIPC.
Topics: olympics, IIPC, web
nls_2010
collection
972
ITEMS
47.7M
VIEWS
collection
eye 47.7M
this data is currently not publicly accessible.
IA-BR-2018
IA-BR-2018
collection
3,696
ITEMS
13.6M
VIEWS
collection
eye 13.6M
ccTLD crawl for .br domain
Topics: br, web, 2018, cctld
UNPAYWALL-PDF-CRAWL-2018-07
UNPAYWALL-PDF-CRAWL-2018-07
collection
1,241
ITEMS
3.9M
VIEWS
by Internet Archive Web Group
collection
eye 3.9M
Web archive data from a crawl of open access PDF URLs provided by Unpaywall.
NLA 2017 Domain Crawl
collection
4,877
ITEMS
34.4M
VIEWS
collection
eye 34.4M
Crawls performed by the Internet Archive in 2017 on behalf of the National Library of Australia.
Topic: nla web 2017
NLS_2011
NLS_2011
collection
1,518
ITEMS
41.3M
VIEWS
collection
eye 41.3M
These crawls of the .es domain were performed in 2011 on behalf of the National Library of Spain (BNE).
Topics: bne, spain, web, 2011
Olympics Crawl 2012
Olympics Crawl 2012
collection
703
ITEMS
41.9M
VIEWS
collection
eye 41.9M
These crawls were performed by IA on behalf of the IIPC in Summer 2012 during and prior to the 2012 Summer Olympics held in London, UK.
Topics: London, olympics, web, 2012, IIPC
nlaweb2016
nlaweb2016
collection
3,591
ITEMS
36M
VIEWS
collection
eye 36M
This crawl of the .au domain was performed on behalf of the National Library of Australia in of 2016. 
Topics: nla, australia, web
NLA 2020 Domain Crawl
NLA 2020 Domain Crawl
collection
6,153
ITEMS
5.7M
VIEWS
collection
eye 5.7M
Crawls performed by the Internet Archive in 2020 on behalf of the National Library of Australia.
Topics: nla, web, 2020
NARA 114th Congressional Crawl
collection
3,619
ITEMS
25.1M
VIEWS
collection
eye 25.1M
This crawl of online resources of the 114th US Congress was performed on behalf of The United States National Archives & Records Administration (NARA).
National Library of Ireland Crawls
National Library of Ireland Crawls
collection
2,623
ITEMS
24.9M
VIEWS
collection
eye 24.9M
Crawls performed by Internet Archive on behalf of the National Library of Ireland. This data is currently not publicly accessible.
NARA 112th Congressional Crawl
NARA 112th Congressional Crawl
collection
708
ITEMS
29.7M
VIEWS
collection
eye 29.7M
This crawl of online resources of the 112th US Congress was performed in Fall of 2012 and early winter of 2013 on behalf of NARA.
Topics: nara, 112th, web
bnf_2007
collection
321
ITEMS
31.4M
VIEWS
collection
eye 31.4M
this data is currently not publicly accessible.
NLA 2018 Domain Crawl
NLA 2018 Domain Crawl
collection
5,641
ITEMS
22M
VIEWS
collection
eye 22M
Crawls performed by the Internet Archive in 2018 on behalf of the National Library of Australia.
Topics: nla, web, 2018
nla_2008
collection
631
ITEMS
30.6M
VIEWS
collection
eye 30.6M
this data is currently not publicly accessible.
MSAG-PDF-CRAWL-2017
collection
1,855
ITEMS
3.5M
VIEWS
by Internet Archive Web Group
collection
eye 3.5M
Microsoft Academic Graph public corpus (Feb 2016) PDF URLs, filtered to remove large sites (pubmed, citeseerx, arxiv) and already-crawled URLs.
Topics: papers, journals
NLA_2015
NLA_2015
collection
3,088
ITEMS
36M
VIEWS
collection
eye 36M
This crawl of the .au domain was performed on behalf of the National Library of Australia in of 2015.
Topics: nla, web, 2015
NLA_2014
NLA_2014
collection
2,189
ITEMS
34.5M
VIEWS
collection
eye 34.5M
This crawl of the .au domain was performed on behalf of the National Library of Australia in of 2014.
Topics: nla, web, 2014
IMLS Museum Universe Data File Crawl
IMLS Museum Universe Data File Crawl
collection
2,885
ITEMS
25.1M
VIEWS
collection
eye 25.1M
2015 crawl of museum websites listed in the IMLS Museum Universe Data File. More about the IMLS MUDF can be found at https://www.imls.gov/research-evaluation/data-collection/museum-universe-data-file
Topic: AIT
NLA 2019 Domain Crawl
NLA 2019 Domain Crawl
collection
5,711
ITEMS
12.6M
VIEWS
collection
eye 12.6M
Crawls performed by the Internet Archive in 2019 on behalf of the National Library of Australia.
Topics: nla, web, 2019
National Library of Sweden
National Library of Sweden
collection
310
ITEMS
24.7M
VIEWS
collection
eye 24.7M
Data collected by Internet Archive on behalf of the National Library of Sweden. This data is currently not publicly accessible.
nl_sweden_2010
collection
309
ITEMS
24.7M
VIEWS
collection
eye 24.7M
this data is currently not publicly accessible.
NLNZ Domain Crawl 2018
NLNZ Domain Crawl 2018
collection
3,540
ITEMS
17M
VIEWS
collection
eye 17M
Domain crawl of the New Zealand web domain (.nz) performed by Internet Archive on behalf of the National Library of New Zealand in January-February, 2018.
Topics: web, nlnz, 2018
NLA 2013 Domain crawl
collection
2,826
ITEMS
34M
VIEWS
collection
eye 34M
This crawl of the .au domain was performed on behalf of the National Library of Australia in Spring of 2013.
Topics: nla, web, 2013
nla_2009
collection
568
ITEMS
26M
VIEWS
collection
eye 26M
this data is currently not publicly accessible.
Open Access Journal Test Crawl (2018)
Open Access Journal Test Crawl (2018)
collection
794
ITEMS
3.5M
VIEWS
by Internet Archive Web Group
collection
eye 3.5M
collection
eye 30.6M
Topics: bne, spain, web, 2013
UNPAYWALL-PDF-CRAWL-2019-04
UNPAYWALL-PDF-CRAWL-2019-04
collection
641
ITEMS
272,398
VIEWS
by Internet Archive Web Group
collection
eye 272,398
National Libary of Ireland 2017 Web Archive
collection
2,510
ITEMS
14.1M
VIEWS
collection
eye 14.1M
2017 domain crawl for National Library of Ireland.
Topics: ireland, web
nla_2006
collection
384
ITEMS
22.8M
VIEWS
collection
eye 22.8M
this data is currently not publicly accessible.
bnf_2005
collection
265
ITEMS
23.3M
VIEWS
collection
eye 23.3M
this data is currently not publicly accessible.
NLS_2012
NLS_2012
collection
776
ITEMS
27.4M
VIEWS
collection
eye 27.4M
This crawl of the .es domain was performed in 2012 on behalf of the National Library of Spain (BNE).
Topics: bne, spain, web, 2012
IMLS Museum Universe 00001
IMLS Museum Universe 00001
collection
2,273
ITEMS
19.4M
VIEWS
collection
eye 19.4M
Crawl 00001 of the IMLS Museum Universe Date File.
nla_2007
collection
371
ITEMS
23.1M
VIEWS
collection
eye 23.1M
this data is currently not publicly accessible.
NLIL_2013
NLIL_2013
collection
1,187
ITEMS
20.7M
VIEWS
by dominic@archive.org
collection
eye 20.7M
This crawl of the .il domain was performed in 2013 on behalf of the National Library of Israel (NLIL).
Topics: nlil, israel, web, 2013
bnf_2006
collection
323
ITEMS
20.6M
VIEWS
collection
eye 20.6M
this data is currently not publicly accessible.
NARA 115th Congressional Crawl
NARA 115th Congressional Crawl
collection
2,886
ITEMS
8.5M
VIEWS
collection
eye 8.5M
This crawl of online resources of the 115th US Congress was performed on behalf of The United States National Archives & Records
Topic: crawldata
nla_2005
collection
175
ITEMS
19.4M
VIEWS
collection
eye 19.4M
this data is currently not publicly accessible.
NLNZ Domain Crawl 2019
NLNZ Domain Crawl 2019
collection
1,702
ITEMS
7.5M
VIEWS
collection
eye 7.5M
Domain crawl of the New Zealand web domain (.nz) performed by Internet Archive on behalf of the National Library of New Zealand in January-February, 2019.
Topics: web, nlnz, 2019
Wide Web Targeted PDF Crawling (2017)
Wide Web Targeted PDF Crawling (2017)
collection
922
ITEMS
558,264
VIEWS
by Internet Archive Web Group
collection
eye 558,264
NLNZ Spring 2017 Domain Crawl
NLNZ Spring 2017 Domain Crawl
collection
2,389
ITEMS
11.6M
VIEWS
collection
eye 11.6M
This crawl of the .nz domain was performed on behalf of the National Library of New Zealand in Spring of 2017.
Topics: nlnz, web, 2017
nlnzweb2016
nlnzweb2016
collection
1,513
ITEMS
16.3M
VIEWS
collection
eye 16.3M
This collection includes content harvested from the Web on behalf of the National Library & Archives New Zealand in January 2016.
Topics: new zealand, web, domain
OA-JOURNAL-CRAWL-2020-07
OA-JOURNAL-CRAWL-2020-07
collection
1,923
ITEMS
1.4M
VIEWS
by Internet Archive Web Group
collection
eye 1.4M
NLNZ_2020
NLNZ_2020
collection
2,105
ITEMS
3M
VIEWS
collection
eye 3M
NLIL_2015
NLIL_2015
collection
1,033
ITEMS
11.4M
VIEWS
collection
eye 11.4M
This crawl of the .il domain was performed in 2015 on behalf of the National Library of Israel (NLIL).
Topics: nlil, israel, web, 2015
Biblioteca Nazionale Centrale di Firenze
Biblioteca Nazionale Centrale di Firenze
collection
224
ITEMS
13.3M
VIEWS
collection
eye 13.3M
Data collected by Internet Archive on behalf of Biblioteca Nazionale Centrale di Firenze. This data is currently not publicly accessible.
nlnzweb2013
nlnzweb2013
collection
921
ITEMS
15.6M
VIEWS
collection
eye 15.6M
This collection includes content harvested from the Web on behalf of the National Library & Archives New Zealand in February 2013.
Topics: web, domain
NLIL_2014
NLIL_2014
collection
971
ITEMS
10.7M
VIEWS
by dominic@archive.org
collection
eye 10.7M
This crawl of the .il domain was performed in 2014 on behalf of the National Library of Israel (NLIL).
Topics: nlil, israel, web, 2014
BNL 2020-21 Winter Domain Crawl
BNL 2020-21 Winter Domain Crawl
collection
584
ITEMS
182,986
VIEWS
by Internet Archive
collection
eye 182,986
Domain crawl of the Luxembourg web domain (.lu) performed by Internet Archive on behalf of the National Library of Luxembourg / Bibliothèque nationale de Luxembourg in December 2020.
Topic: web
DIRECT-OA-CRAWL-2019
DIRECT-OA-CRAWL-2019
collection
2,566
ITEMS
448,217
VIEWS
by Internet Archive Web Group
collection
eye 448,217
Fed Site Closure Crawls
Fed Site Closure Crawls
collection
1,858
ITEMS
11.9M
VIEWS
collection
eye 11.9M
These are crawls performed on US Federal Government Web sites prior to their removal or merge with other resources.
Topics: federal, web, closures
Fed Site Closures 2011
Fed Site Closures 2011
collection
1,855
ITEMS
11.9M
VIEWS
collection
eye 11.9M
This crawl was performed in Fall of 2011 to archive Federal government web sites that were either slated for removal or for merger with other online resources.
Topics: federal, web, 2011
NARA 111th Congressional Crawl
NARA 111th Congressional Crawl
collection
216
ITEMS
12M
VIEWS
collection
eye 12M
This crawl of online resources of the 111th Congress of the United States was performed in Fall of 2010 and Winter of 2011 on behalf of NARA.
Topics: nara, 111th, congress, web
NLS_elec2011
NLS_elec2011
collection
280
ITEMS
11.1M
VIEWS
collection
eye 11.1M
This crawl was performed on behalf of the National Library of Spain (BNE) in Fall of 2011 to archive the National elections in Spain.
Topics: elections, web, 2011, spain, bne
UNPAYWALL-PDF-CRAWL-2020-03
UNPAYWALL-PDF-CRAWL-2020-03
collection
344
ITEMS
138,971
VIEWS
by Internet Archive Web Group
collection
eye 138,971
OA-DOI-CRAWL-2020-02
OA-DOI-CRAWL-2020-02
collection
278
ITEMS
277,553
VIEWS
by Internet Archive Web Group
collection
eye 277,553
NLA_2010
NLA_2010
collection
180
ITEMS
10.8M
VIEWS
collection
eye 10.8M
This crawl was a domain scale harvest of .au performed for the National Library of Australia in 2010.
Topics: nla, web, 2010
BNL 2017 Winter Domain Crawl
BNL 2017 Winter Domain Crawl
collection
1,181
ITEMS
6.1M
VIEWS
collection
eye 6.1M
Domain crawl of the Luxembourg web domain (.lu) performed by Internet Archive on behalf of the National Library of Luxembourg / Bibliothèque nationale de Luxembourg in December 2017 and January 2018.
Topics: web, 2017, 2018, luxembourg, BNL
KB Curated List Crawl 2019
KB Curated List Crawl 2019
collection
1,287
ITEMS
3.3M
VIEWS
collection
eye 3.3M
KB Curated List Crawl 2019.  This data is not currently publicly accessible.
Topic: web
Indonesia 2017 Domain Crawl
Indonesia 2017 Domain Crawl
collection
667
ITEMS
6.6M
VIEWS
collection
eye 6.6M
Crawls performed by the Internet Archive of the .id (Indonesia) web domain. This data is not currently publicly accessible.
Topics: web, 2017
IA-BR-2018
web
eye 39,389
favorite 0
comment 0
Internet Archive crawldata from ccTLD .br domain crawl, captured by wbgrp-svc230.us.archive.org:IA-BR-2018-12-20 from Fri Jan 18 04:42:04 PST 2019 to Thu Jan 17 22:08:11 PST 2019.
Topic: crawldata
NLIL_2020
NLIL_2020
collection
409
ITEMS
46,459
VIEWS
collection
eye 46,459
Crawls performed by the Internet Archive in 2020 on behalf of the  National Library of Israel .
Topic: web