Implications and Conclusions

Table of Contents

  1. Introduction and Background
  2. Data
  3. Methods and Key Measures
  4. Results
  5. Discussion
  6. Limitations
  7. Implications and Conclusions
  8. Division of Labor
  9. References

Implications and Conclusions

Due to the limited data available to us, the clustering that we have done should be used as an initial attempt at the research question. Although the clusters resulting from the Tonson dataset look promising, our results on Dataset1 are not robust enough in order to be relied on to enrich the datasets we are working with. We recommend first gathering a larger dataset and identifying documents for which metadata can be found for. 

Annotation of ornaments in the works of printers have already been manually analyzed (e.g. Samuel Richardson). In this project, we worked to test the detailed method in depth using machine learning. We recommend repeating the work with a larger dataset, especially using the data for which printer IDs are contained in the metadata. Another idea is to further research with algorithms which cluster more fine-grained with regard to the same shape. Finally, we recommend training an algorithm on one printer who is suspected to have printed also politically opposing statements (e.g. John Darby II) and then detecting works published without the name of the printer. This could be interesting particularly to see if the printer’s anonymous work could be identified.

Within this project, there were several ways in which we could have furthered our individual and group work to reflect more digital humanities focus. Firstly, we did not qualitatively explore the technical YOLO outcome – in order to pull meaningful humanities understanding from this work, this needed further exploration and theorizing of results through, for example, a qualitative approach. Understanding potential biases in this work could lead to then adapting the model approach to create a lower-bias project, if possible.Given how one of the biggest limitations with our clustering results was the quality of the source data, which came unfiltered from YOLO’s classification step, this could translate to significant differences for our final results.  Secondly, NORPPA has the potential to be a good algorithm for finding printer fingerprints, as it tries to cluster images that have the same content (the same headpieces, or initials). Thus, images which are the same but may have little differences because of the erosion of the printing devices would still be clustered together. This would enable the prediction of printer ids for other works published using the same printing devices. Especially the combination of two printing ornaments on the same page (as the headpiece and the initial) would enable a collection of ornaments printed by the same printer through a mapping of the ornaments. A question which would have needed further analysis is whether ornaments surrounding differing text or ornaments distorted by the process of digitization would have also been clustered together. The metadata analysis done for the other clustering result could have been reused on those results to check the overall quality. If the results are as good as predicted, a qualitative analysis of single clusters for which the metadata results indicate non-fitting results could reveal biases in the clustering algorithm or maybe even in the metadata. If the results indicate problems with the algorithm, data augmentation techniques could be further considered. Next, given that NORPPA bases its clusters on content, more printer ids could be predicted and fuller profiles of single printers created. The CNN-based neural embeddings, on the other hand, clusters more on style. The two different clustering foundations could have been further used to examine how much style and content correlate for a printer’s “fingerprint”, and see if some printers produce the same style more often than others. Further, the data collection could be stored in a way that is both accessible for technical work (such as pulling it into a cluster model) but at the same time, accessible for analysis by Humanities researchers. The existing data exists as a massive single file which requires technical expertise that poses a barrier to humanities accessibility, and certainly restricts filtering and feature search that would be useful in an analysis. Being able to easily create subsets of interests could enable much more research questions concerning single printers, single regions or other features of interest. Moreover, if we identify a region of interest (due to this filtering), we could then study the subset of results in multiple ways, such as a visualization that would work towards supporting analysis. For that, the work could be further developed to produce the publication map visualization. This visualization could be supplemented with a comparative analysis of regions/cities exploring potential publication patterns in terms of number of publications or date, for example. Using what we know about publishing during this time, how knowledge and ideas travel, this could produce interesting results of how printers operated that we don’t yet know from previous findings which could supplement our analyses (including biases). 

As further improvement to our research questions, we pose three ideas that we believe would give additional meaningful insight on what we know about printer networks and consequently the dissemination of knowledge. One question we pose is: how do our clustering results compare to previous findings? Specifically, there are network analysis results (such as that of Valleriani, 2022) that we could study. This comparison would allow us to either show support of current network understanding, show potential gaps that need to be further studied, or show differing results. Moreover, contradictory results would open the door to studying the differences and the reasons for these differences. Generally, more abstract theories like the flow of information or network behavior could be also further analyzed with a fuller dataset. Another research question that would be interesting to study is researching which printers have more (or less) variance within their prints when it comes to styling. This could be done by seeing how many styles (or clusters) are represented within a single printer’s works. This is a slight modification of our previous work, instead counting how many clusters are represented for a single printer, and then comparing which printers have a higher concentration of intra-cluster printmarks (low variance) or across-clusters (higher variance). A third potential idea would be to study key printers of political opposing texts identified through the printer id prediction and to map single printers to different movements and identified information dissemination flows. As a first part, we would have to identify the printers of these opposing publications, many of whom did so under a pseudonym or without a name at all, but we could try to identify the printers based on our clustering analysis that uses the various printmarks within the anonymized work to identify the printer. A qualitative analysis and close reading of the works in question and a mapping to movements via the mentioned ideas or concepts would have needed to be done additionally. This analysis could shed new light on how information was disseminated in times of political revolutions during the 18th century.

< Previous section: Limitations | Next section: Division of Labor >