Using Getty Data in Palladio

Using Getty Data in Palladio

Palladio: http://hdlab.stanford.edu/palladio/

Dataset: Created with data from the Getty Provenance Index Databases, specifically the sales catalog database. This is available for download at https://github.com/chnm/doingdh. File name GPID_clean.csv

About Palladio (from their home and about pages)

Palladio is a product of Stanford University’s Research Lab that grew out of the Mapping the Republic of Letters project, which examines the scholarly communities and networks of knowledge from 1500 to 1800. Palladio enables users to create or open Palladio projects that visualizes data. Palladio is a tool for “reflective practice” and is “an environment that supports thinking through data.”

With Palladio, users can copy and paste spreadsheets; drag and drop data contained in .csv, .tab, .tsv files; or link to a file in a public Dropbox folder to create a new project. When visualizing the data, users can choose to represent the data in a variety of forms, including: a map, graph, list, or gallery.

The map view allows users to see any coordinates data as points on a map. Relationships between points are connected by lines. Points on the map can be proportionally sized to represent their relative magnitude within the dataset. The nodes and links of map visualizations are exportable as .svg files.

The graph view visualizes the relationships between any two dimensions of a user’s data. Graph information is displayed as nodes connected by lines. Like in the map view, nodes can be scaled to reflect their relative magnitude within a user’s data. Graphs can be exported as .svg files.

In the list view, dimensions of a user’s data can be arranged to make customized lists. List visualizations can be exported as .csv files.

In the gallery view, data is displayed within a grid setting. Within this view, dimensions of a user’s data can also be linked to web data hosted on other sites. Data can be sorted according to different dimensions.

Tutorial

  1. Go to http://hdlab.stanford.edu/palladio/ and click start.
  2. On this page, you can create a new project, load an existing project, or try to create a project with sample data. We’re creating a new project. You can either copy and paste a file of tabular data, drag and drop the file, or link to a file in a public Dropbox folder. We’re going to drag and drop a file of data that I’ve cleaned up from the Getty Provenance Index Databases, which is available for download on Github. Once the file has been dragged and drop, click load.
  3. You are now looking at the data view. You can title the project, add details such as author, date, and description, and title your tabular data. You can also download your data or trash your data. You might see red dots next to certain fields. This means that Palladio has detected that there might be errors in your data or that not all of your data seems to be of the same type. I’m going to give the project the title Instances of the term ‘woman’ in Getty catalog sales, and I’ll label the dataset Getty data. If I was going to make this a public visualization, I would add more details, specifically information about the data and how I’ve manipulated it for my own specific purposes. data
  4. Now I’m going to map my data. The first layer is the base layer. I’m going to add a layer of points that map the coordinates contained within my dataset. I’ll click new layer, and under name I’ll say auction house locations. For places, I’ll select coordinates, and for tooltip label, I’ll select sale location. Then I’ll click add layer. Now all of the coordinates are mapped, and if I hover over a point it’ll say London, England (191), meaning that 191 paintings were sold in London according to the dataset. map
  5. Now I’m going to graph the data. I’m going to create a few different graphs in order to determine which data creates the most meaningful and useful network visualization. I’ll start with buyer as the source, and auction house as the target. This graph visualizes all the people who bought paintings from various auction houses. The dataset contains a disproportionate amount of information from Christie’s, which this network reflects. If I switch the source to seller, then I’ll be able to visualize all of the sellers of paintings and their relationship to the auction houses. Note that both of these graphs are 1) not particularly meaningful, and 2) very crowded with information. It is possible to have too much information displayed in a visualization, and both of these networks demonstrate that idea. A better, more meaningful network visualization emerges when I use the present location as the source and the target as the sale location. This shows connections between where the paintings were sold and where they are currently. badviz goodviz
  6. In the table view, I can create a table from my data. I will set sale date as the row dimension, and the buyer as the dimensions. This displays a table that shows which buyers purchased paintings on the dates listed in the dataset. table
  7. In the gallery view, I’m going to assign Artist Name 1 to the title field, title to the subtitle field, and sale date to the text field. This creates a gallery that prefaces the artist, lists a specific work by that artist, and the date in which that work was sold. The gallery view works best if you have accompanying images, or a field for image URLs in your dataset. gallery

Try it out on your own

  1. Select a dataset. You can use the dataset I created from the Getty Provenance Index Database, available for download here; one of your own; or you can create one yourself.
  2. Input that data into Palladio. Play around with which visualizations create the most meaningful results with your dataset, and which visualizations are least helpful. Why is this so? What can you learn about your data from these visualizations? What do these visualizations show you that is new?

Things to keep in mind

  • Limitations of your dataset. Originally, the dataset I used in this demonstration did not have coordinates–I went in and added those in order to show how the map view works. Keep in mind what your data is and what it is describing. Not all data should be used in Palladio, and Palladio should not be used with all data.
  • Limitations of the tool. Not all problems can be solved using a hammer, and not all visualization tools will be the right one for your project. Palladio should not be the go-to tool for each and every project you have.

Where to go for help

A note on the data I’m using

I began with the Getty Provenance Index Database. Then I searched under sales catalogs, which is a database of works of art that have been sold at auction in Belgium, France, Germany, Great Britain, the Netherlands, and Scandinavia between 1650 and 1945. Because I am interested in the ways in which women are represented in art during that period, I chose to search for ‘woman,’ which produced 7,341 results. I downloaded the records as a CSV. From there, I chose the first 199 rows of the dataset and added a column of coordinates that match with the sales location field.