Crawls performed by Internet Archive on behalf of the National Library of Australia. This data is currently not publicly accessible.
18.5M
18M
Dec 19, 2017
12/17
by
Internet Archive Web Group
A series of open web crawls targeting journal articles, technical memos, essays, datasets, and other research publications. This collection contains WARC and CDX files that end up in Wayback ( https://web.archive.org ). See also bibliographic metadata corpuses at https://archive.org/details/ia_biblio_metadata
Data collected by Internet Archive on behalf of the National Library of Spain. This data is currently not publicly accessible.
This crawl was performed in Summer & Fall of 2012 to archive the US Federal Elections.
Topics: US, federal, elections, web, 2012
This collection contains collaborative Election crawls performed by IA.
Topics: elections, web
Crawls of the french domain space performed by Internet Archive on behalf of Bibliotheque Nationale de France. This data is currently not publicly accessible.
National Archives and Records Administration crawl performed by Internet Archive. This data is currently not publicly accessible.
Crawls performed by Internet Archive on behalf of the National Library of New Zealand. This data is currently not publicly accessible.
National Library of Austrailia crawl. This data is currently not publicly accessible.
this data is currently not publicly accessible.
National Library of Luxembourg
Topic: Luxembourg
this data is currently not publicly accessible.
42.9M
43M
Oct 3, 2013
10/13
by
dominic@archive.org
Data collected by Internet Archive on behalf of the National Library of Israel. This data is currently not publicly accessible.
Topic: nlil
This collection includes all collaborative Olympic crawls performed by IA for the IIPC.
Topics: olympics, IIPC, web
this data is currently not publicly accessible.
ccTLD crawl for .br domain
Topics: br, web, 2018, cctld
3.9M
3.9M
Jul 17, 2018
07/18
by
Internet Archive Web Group
Web archive data from a crawl of open access PDF URLs provided by Unpaywall.
Crawls performed by the Internet Archive in 2017 on behalf of the National Library of Australia.
Topic: nla web 2017
These crawls of the .es domain were performed in 2011 on behalf of the National Library of Spain (BNE).
Topics: bne, spain, web, 2011
These crawls were performed by IA on behalf of the IIPC in Summer 2012 during and prior to the 2012 Summer Olympics held in London, UK.
Topics: London, olympics, web, 2012, IIPC
This crawl of the .au domain was performed on behalf of the National Library of Australia in of 2016.
Topics: nla, australia, web
Crawls performed by the Internet Archive in 2020 on behalf of the National Library of Australia.
Topics: nla, web, 2020
This crawl of online resources of the 114th US Congress was performed on behalf of The United States National Archives & Records Administration (NARA).
Crawls performed by Internet Archive on behalf of the National Library of Ireland. This data is currently not publicly accessible.
This crawl of online resources of the 112th US Congress was performed in Fall of 2012 and early winter of 2013 on behalf of NARA.
Topics: nara, 112th, web
this data is currently not publicly accessible.
Crawls performed by the Internet Archive in 2018 on behalf of the National Library of Australia.
Topics: nla, web, 2018
this data is currently not publicly accessible.
3.5M
3.5M
Aug 4, 2017
08/17
by
Internet Archive Web Group
Microsoft Academic Graph public corpus (Feb 2016) PDF URLs, filtered to remove large sites (pubmed, citeseerx, arxiv) and already-crawled URLs.
Topics: papers, journals
This crawl of the .au domain was performed on behalf of the National Library of Australia in of 2015.
Topics: nla, web, 2015
This crawl of the .au domain was performed on behalf of the National Library of Australia in of 2014.
Topics: nla, web, 2014
2015 crawl of museum websites listed in the IMLS Museum Universe Data File. More about the IMLS MUDF can be found at https://www.imls.gov/research-evaluation/data-collection/museum-universe-data-file
Topic: AIT
Crawls performed by the Internet Archive in 2019 on behalf of the National Library of Australia.
Topics: nla, web, 2019
Data collected by Internet Archive on behalf of the National Library of Sweden. This data is currently not publicly accessible.
this data is currently not publicly accessible.
Domain crawl of the New Zealand web domain (.nz) performed by Internet Archive on behalf of the National Library of New Zealand in January-February, 2018.
Topics: web, nlnz, 2018
This crawl of the .au domain was performed on behalf of the National Library of Australia in Spring of 2013.
Topics: nla, web, 2013
this data is currently not publicly accessible.
3.5M
3.5M
Apr 9, 2018
04/18
by
Internet Archive Web Group
Topics: bne, spain, web, 2013
272,398
272K
Apr 26, 2019
04/19
by
Internet Archive Web Group
2017 domain crawl for National Library of Ireland.
Topics: ireland, web
this data is currently not publicly accessible.
this data is currently not publicly accessible.
This crawl of the .es domain was performed in 2012 on behalf of the National Library of Spain (BNE).
Topics: bne, spain, web, 2012
Crawl 00001 of the IMLS Museum Universe Date File.
this data is currently not publicly accessible.
20.7M
21M
Oct 3, 2013
10/13
by
dominic@archive.org
This crawl of the .il domain was performed in 2013 on behalf of the National Library of Israel (NLIL).
Topics: nlil, israel, web, 2013
this data is currently not publicly accessible.
This crawl of online resources of the 115th US Congress was performed on behalf of The United States National Archives & Records
Topic: crawldata
this data is currently not publicly accessible.
Domain crawl of the New Zealand web domain (.nz) performed by Internet Archive on behalf of the National Library of New Zealand in January-February, 2019.
Topics: web, nlnz, 2019
558,264
558K
Sep 21, 2017
09/17
by
Internet Archive Web Group
This crawl of the .nz domain was performed on behalf of the National Library of New Zealand in Spring of 2017.
Topics: nlnz, web, 2017
This collection includes content harvested from the Web on behalf of the National Library & Archives New Zealand in January 2016.
Topics: new zealand, web, domain
1.4M
1.4M
Jul 6, 2020
07/20
by
Internet Archive Web Group
This crawl of the .il domain was performed in 2015 on behalf of the National Library of Israel (NLIL).
Topics: nlil, israel, web, 2015
Data collected by Internet Archive on behalf of Biblioteca Nazionale Centrale di Firenze. This data is currently not publicly accessible.
This collection includes content harvested from the Web on behalf of the National Library & Archives New Zealand in February 2013.
Topics: web, domain
10.7M
11M
Aug 4, 2014
08/14
by
dominic@archive.org
This crawl of the .il domain was performed in 2014 on behalf of the National Library of Israel (NLIL).
Topics: nlil, israel, web, 2014
182,986
183K
Dec 10, 2020
12/20
by
Internet Archive
Domain crawl of the Luxembourg web domain (.lu) performed by Internet Archive on behalf of the National Library of Luxembourg / Bibliothèque nationale de Luxembourg in December 2020.
Topic: web
448,217
448K
Feb 15, 2019
02/19
by
Internet Archive Web Group
These are crawls performed on US Federal Government Web sites prior to their removal or merge with other resources.
Topics: federal, web, closures
This crawl was performed in Fall of 2011 to archive Federal government web sites that were either slated for removal or for merger with other online resources.
Topics: federal, web, 2011
This crawl of online resources of the 111th Congress of the United States was performed in Fall of 2010 and Winter of 2011 on behalf of NARA.
Topics: nara, 111th, congress, web
This crawl was performed on behalf of the National Library of Spain (BNE) in Fall of 2011 to archive the National elections in Spain.
Topics: elections, web, 2011, spain, bne
138,971
139K
Mar 5, 2020
03/20
by
Internet Archive Web Group
277,553
278K
Feb 5, 2020
02/20
by
Internet Archive Web Group
This crawl was a domain scale harvest of .au performed for the National Library of Australia in 2010.
Topics: nla, web, 2010
Domain crawl of the Luxembourg web domain (.lu) performed by Internet Archive on behalf of the National Library of Luxembourg / Bibliothèque nationale de Luxembourg in December 2017 and January 2018.
Topics: web, 2017, 2018, luxembourg, BNL
KB Curated List Crawl 2019. This data is not currently publicly accessible.
Topic: web
Crawls performed by the Internet Archive of the .id (Indonesia) web domain. This data is not currently publicly accessible.
Topics: web, 2017
39,389
39K
Jan 18, 2019
01/19
by
Internet Archive
web
eye 39,389
favorite 0
comment 0
Internet Archive crawldata from ccTLD .br domain crawl, captured by wbgrp-svc230.us.archive.org:IA-BR-2018-12-20 from Fri Jan 18 04:42:04 PST 2019 to Thu Jan 17 22:08:11 PST 2019.
Topic: crawldata
Crawls performed by the Internet Archive in 2020 on behalf of the National Library of Israel .
Topic: web