Project Data

Data Assessment

For this acitivity, I upoloaded my Post Civil War clean data into Open Refine.
1. The “Text Facet” and “Numeric Facet” options show that my data can easily be parsed down both by date and by name of the individual. For this specific data set, it was very helpful to narrow down how many people were born in each year.
2. When I uploaded the original CSV file of the SJ Cemetary, I had to figure out how to get only the birth year out of the birthdates listed for each person. This meant that I somehow needed to split the columns based on commas and forward slashes often used in writing dates (,/). This had to be done in Open Refine. Once all of my columns were separated by the birthyear, I was left with double the columns I had before. I then had to export this into Excel. I was able to combine all of the columns with birth years into one column by copying and pasting data and shifting specific columns to the right.
3. This method would best be used on columns such as “Monument Condition” that could have the same word or words (good, good condition) used to mean the same thing.
4. The birthdate column needed to be greatly modified for consistency, becaue the dates were written in different formats, and, occasionally, a birthdate was not given.
5. My research was less about the people themselves and more about the years and chronology of births.

 

Data Cleanup

Cleaning and organizing the data set.Cleaning and organizing the data set.

1. Open the data as a CSV file using OpenRefine software. Open it under the Create Project option.

2. Hit “Create Project” once it has finished uploading.

3. Once the project has loaded, collapse all of the columns except for “Name” and “Birth Date.” Use the drop down menu and choose “View.”

4. This issue now is that we only want the year, not the full date of birth. In order to separate our data, we must hit the drop down arrow again on the “BirthDate” column and press “Edit Column.” We also want it written as a full numerical value. For example, Eighteen Sixty-six should be written as 1866.

5. Press “Split into Several Columns,” and choose to separate by “.” Also be sure to uncheck the box that says “Remove This Column.”The data should now be more separated than before. However, the next issue is that some dates are written in the MM/DD/YYYY format. Choose “Edit Column” again, press “Split into Several Columns,” and separate the “BirthDate” column by “/”

6.This should now have your data divided up into many rows.

7.Hit the Export button on the top right-hand side of your screen.

8. Export the file as an Excel sheet and save it to your desktop.

9. Open this Excel file.

10. Parse through your data and move entire columns containing the birth years to the right.

11. Delete columns as you go that have no years or repeating data.

12. Organize your dates once they are in the same column chronologically using the “sort” button on your task bar.

13. After cleaning and removing all “N/A” pieces of data and years that do not belong in the data set, your data should be ready to upload into various Network Analysis Software.

14. For purposes of clarity, only look at five years before and five years after the Civil War. Delete all rows containing people with birth years earlier than 1856 and after 1870. Also remove those individuals whose birth years were during the Civil War (1861-1865).

15. Be sure to save your file also as a CSV file.

16. In order to use thedata in UCINET, first change all dates to words. Again, for clarity, all years are recognized by their final two digits, and they are written as follows: 1866 = “Six, Sixty” **This formant mimics a name, which works well with the UCINET program.

17. For UCINET, make a mutual, 2-way relationship between individuals and their birth years. This is the EdgeList. In Column 1, the names should be listed first. After the names are listed, the years should be listed. In Column 2, the years should be listed first. After the years are listed, the names should be listed. That way UCINET understands relationships as follows: Myers, George Holman “knows” 1866, and 1866 “knows” Myers, George Holman

This data should then be split between pre and post civil war data sets (1856-1860) and (1866-1870). This leaves you with two distinct data sets saved as individual CSV files.

The data for this project, including the uncleaned original data, is provided below.

SJC Data Pre Civil War

SJC Data Post Civil War

SJCemetery Original Data