Skip to main content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.
The Web of Science (WoS) XML data provides the raw data behind the Web of Science Database for the years 1980-2019.
Terms and Conditions for Access and Use
- The Web of Science XML data is intended for non-commercial, academic research.
- The data is restricted to the use by faculty, staff, students, and researchers at the Georgia Institute of Technology. As the Data’s publisher must provide prior approval for use or storage on devices physically located outside of the United States, you must first seek and receive written approval from the Library’s Data Scientist Librarian for any use of the Data outside the United States. Commercial use of the data or derivatives is strictly prohibited.
- The data and derivatives may not be shared outside of Georgia Institute of Technology including other universities, institutions, government agencies, or corporate entities.
- You may no longer use the data set if your affiliation with Georgia Tech ends, including graduation, retirement, resignation, or termination.
This is a large and complex data set, and the GT LIbrary will continue to evolve its support for the product. We are currently at Phase 0.
Phase 0: Spring 2021
- A Data Scientist Librarian will mediate and provide access to the data. End-users are responsible for abiding by the terms and conditions for access and use and developing their own infrastructure to analyze the file. (See code examples below).
- To request access to the data please contact firstname.lastname@example.org.
Phase 1: Summer/Fall 2021 (estimate)
- Data access will provided by direct download via a web portal. End-users will be responsible for abiding by the terms and conditions for access and use and developing their own infrastructure to analyze the file. (See code examples below).
Phase 2: Fall 2021/Spring 2022 (estimate)
- Data access will be provided by direct download and via a database solution. End-users will be responsible for abiding by the terms and conditions for access and use. End-users will be able to use their own infrastructure or create structured queries via the database solution.
- The Library will provide end-user training for the database solution.
WoS Generic XML Parser (Indiana University)
The generic XML parser is designed to process XML records into SQL script that can then be uploaded into a relational database. The parser converts XML into a series of insert statements based on a configuration file that specifies the tables and columns that are desired by the user.
WoS MySQL Database builder (University of Chicago)
Create and populate a MySQL Database from Web of Science raw xml data
WoS Explorer (University of Wisconsin-Madison)
A simple utility written in the Python programming language to find and parse article records within the Web of Science data set. These scripts work with a JSON serialization of the data and are based on taking advantage of the single-record-per-line JSON data to stream through the data with a low memory footprint.