Project Data

Data Assessment:

1. This data set is a “humanities” data set, because it focuses on the movement of art during World War II to the Getty between the years 1939-1945. Because I am approaching the data as a historian, I am focusing on which countries supplied art to the Getty during which years, looking for a correlation between Nazi-occupation and increased art sales.
2. My data comes from the Getty Provenance project, and the data is open for public use as long as the proper license and credit is given.
3. The spatial component of my data comes from the countries of origin of each object sold during a specific year.
4. My data is in an Excel spreadsheet.
5. My data needed to parsed down from the original 40000+ rows the Getty Provenance project provided to include only 1939-1945. The objects with unknown provenance had to be deleted, and the artist nationality (country of origin) had to be changed from an adjective to a nount (i.e. “French” to “France”).


Data Clean-up:

1. Data was parsed down from the original 40000+ rows of data provided by the Getty Provenance Project.
2. Data that was kept included only the years 1939-1944 and the columns “Artist Nationality” and “Year of Sale.” The years were then separated into 7 different Excel sheets.
3. Adjectives such as “French” had to be changed to “France” using Find and Replace so that the data could be geocoded in the Carto program.
4. Find and Replace was also used to get a count for the number of objects. For example, if “France” appeared 15 times, that meant that 15 different objects were sold from France to the Getty during that specific year.
5. The data was then organized into three columns: Country of Origin, Number of Objects, and Year of Sale.
6. In order for the Year of Sale column to be read as a date by Carto, each year had to have a month and a day as well to match the MM/DD/YYYY format. Each country was paired with the date 01/01/YYYY depending on the year of sale.
7. Once the data was cleaned it was combined into a large excel sheet covering years 1939-1945.

Data License:


The Trust makes the Getty Provenance Index® datasets available under the least restrictive open license possible; however, please check the README documentation specific to each dataset, as rights may vary between datasets. If you create a derivative dataset from a Getty Provenance Index® dataset, we ask that you consider releasing the derivative under the least restrictive license possible.


Getty Research Institute®, CSV exports of the Getty Provenance Index® (November 21, 2016),


Data Access:

Go to the link below to download the Excel data sheet.