Methods & Materials

Table of contents

  1. Introduction
  2. Printers
  3. Methods & Materials
  4. Case Studies
  5. Visualisations
  6. Reflection
  7. Conclusion
  8. Division of Labour and Reflection on Learning
  9. References

Methods & Materials

Our project work is based on the Eighteenth Century Collections Online (ECCO), which is a digital historical archive of over 180 000 titles printed between 1701 and 1800 in Britain, Ireland, overseas territories under the British colonial rule, and the United States (Gale, n.d.). ECCO also draws on the English Short Title Catalogue (ESTC), which is a collection of titles only and does not contain footage of the books. The two repositories are used by the University of Helsinki’s COMHIS research group to investigate 18th century publishing. As large repositories, ECCO and ESTC are less than representative in terms of different document types, places of publication, reprints compared to first printings, and different authors (Tolonen et al. 2022). Our group, as all the previous and future project course groups, was given the opportunity to use the resources of COMHIS for our eventual research interest, the decorative initials.

As the previous project group investigated genres in the collection of the book database, we were also interested in how the genre could be detected through the images in the books. To get data of the images, we depended on the work of COMHIS researchers who had developed a way to detect images from book pages in the database and cluster them together based on similarity. Although there are different images in the books varying from full-page pictures to smaller strip-like decorations in between the chapters, guided by our supervisors we decided to focus on the images of initials used in the beginning of chapters. As mentioned, these fall under two categories, decorated initials (DI) and factotums (FT), and at first we focused on both. The COMHIS researchers applied a few models to collect the images of initials from the database to produce the files which we could use in our project. First they applied the MaskRCNN model to extract the bounding box through all the pages and identify whether there is an initial, and where it is on the page. Next they applied the SimCLR model to find similar pages and then cluster these results. This resulted in a csv file which contained around 50 000 DI initials. The last step for the researchers was to look into the headpieces (HP) annotation table (result of the COMHIS group’s research on headpieces) and choose the initials of high-likelihood in it. At the end of this step, they had a csv file which contained around 2000 DIs based on the possible DI-HP pairings. We used this last file as the basis of our own annotations in this project, and the 50 000 DI file became relevant to us later.

We started by having a bigger annotation round, where all team members were assigned 36 model-produced clusters to annotate each. Here we examined the DI and FT clusters to see whether matching initials were placed together correctly. Some clusters were bigger and contained a mixture of different DIs. We marked down these different groups of DIs we could find within a cluster, and had them separated into distinct new clusters for the next round of annotation. Some clusters that were big and had much variation of initials we classified as messy and left aside for a moment. In the next annotation round we determined if some of the clusters should be merged together. For that a COMHIS researcher again produced us a html file page with one random example initial from all of the clusters. At this point we also decided to go forward with only the DIs in order to manage time. Thus, for the sake of having more data we also decided to go through the messy clusters we had left aside at first to find more DIs to add to our data. With these steps done we had a final version of the DI clusters, which we then started to use for analysis.

The starting point of our analysis were Keith Maslen’s books about Samuel Richardson’s  (Maslen, 2001) and William Bowyer’s (Maslen, 1974) printer ornaments as suggested by our instructor. At this stage we split into smaller groups, so two groups of two people each looked into either Richardson’s or Bowyer’s DIs, comparing the ones found in the books to the DIs in our clusters, chasing the leads that came from the comparisons. The third group consisted of a single member working parallel to the other groups producing tables and graphs from the metadata, which could then be used for analysis. The qualitative and close-reading groups looking into the London printer DIs also used the Compositor. Therefore the webpage https://compositor.bham.ac.uk/ developed by a research group looking into ECCO (Wilkinson et al. 2021) was also important to our project. While also based on ECCO, the website provides a rudimentary user-interface to perform image searches and to visually analyse the decorations in the database’s books. We used the regular and visual search in Compositor to cross-check the images we had in our files and to uncover any missing data (for further details about this see the subsection “Missing DIs” under “Reflection”). To compare the results from our clusters and Compositor, we had to first search for the book in Compositor using the title and publishing year. This usually brought up many different books which had to be looked through manually. Then we got the ESTC ID (unique identification code for each of the books in the ECCO database), which could be used to search for the book in our csv files. For this we mainly used the csv file containing 50 000 DIs as it also had DIs that were not in our final clusters, but came up in the search results of Compositor. For this reason, the ESTC ID was important for us in connecting the books in the different repositories and files. As our two groups went about the research of Richardson’s and Bowyer’s DIs, they also looked into different literature and sources about the topics they found while browsing the data to write an analysis for the blog.

When it comes to the work we conducted on the metadata and graphs, we tried to look into the connections between the DIs and genre. We employed a range of methodologies to analyse our dataset, which spanned publications from 1708 to 1780 and included 65 unique decorative initial clusters. To rank diversity by genre, we used a module-based approach to identify typographic trends. Heatmaps were created to visualise the concentration of superclasses and letter distributions across different genres. Literature and Language showed the highest diversity, with 49 superclasses and 19 letters, while the Law module exhibited the least diversity, with 6 superclasses and 4 letters.

Temporal trends in DI usage were examined by tracking patterns across decades, revealing a decline after the 1720s and 1730s, possibly due to changing aesthetic preferences and technological advancements. We used heatmaps to investigate printer diversity and specialisation, normalising the distribution of DI clusters and letters. This method helped us identify notable printers, such as Watts, J., who stood out for high diversity.

To assess relationships between module and typographic complexity, we performed a correlation analysis. This revealed a negative correlation in Religion and Philosophy but a positive correlation in Social Sciences and Literature. An entropy analysis helped quantify diversity levels across modules, showing History and Geography had the highest entropy.

For a comparative analysis of Bowyer and Richardson, we manually examined their works for DI usage patterns. Bowyer’s works showed significant usage in General Reference but none in Law, while Richardson’s DI usage was more consistent across modules, albeit less extensive than Bowyer’s peak. These methodologies collectively provided a comprehensive understanding of DI trends and their relationship to genre and printer specialisation.

Our general communication took place through a specialised Slack channel, where besides staying up to date with the assignments, everyone could post or add comments whenever they had any questions. All of our csv and html files that were produced with the clustering were stored in the Puhti environment, where we also had access to different metadata pertaining to our work. Furthermore, we had a Google Drive folder with the different files of reading material, reading and meeting notes, research plan, and other administrative files.

Next section: Case Studies