Autumn 2023:  Section-Level Genre Analysis

Introduction

In this research project, our focus lies in uncovering genres within books that have been traditionally categorized under a single main genre label. We aim to challenge the notion that a book labeled with a particular genre is monolithic in its content. Leveraging the language model based on BERT, our analysis centers on a diverse collection of 18th-century texts sourced from the ECCO (Eighteenth Century Collections Online) dataset, encompassing a broad spectrum of genres and subject matter. Our primary objective is to quantitatively investigate and reveal the presence of genre shifts and transformations within these labeled texts. Through this exploration, we seek to shed light on the intricate interplay of genres that coexist within a single work, transcending traditional categorizations. Fundamental questions emerge, such as, “To what extent does a book labeled under a specific genre genuinely adhere to that label?” or “What are the characteristics of genres in terms of relating with other genres?”

This project report is divided into six parts: Introduction, Background, Materials and Methods, Analysis, Conclusion and Division of Labour. First we go through the theoretical background of genre analysis, then go through the data collection process, how the labeling was done, and what kind of training data we have for the genre classifier. Next, training the classifier and displaying its prediction accuracy. In the analysis chapter, we look at the general trends within each genre and then illustrate the interplay of genres with individual books. We also explore how stylometric methods could help determine the text structure of each genre used in writing different types of texts in the 18th century.

> Next section: Background