Spring 2023: Traces of Macbeth: A Study of Text Reuse

Division of work & reflections on learning

Bruce

I mainly worked on data exploration and data processing, writing scripts for data processing which enabled downstream analysis. Then, I devoted most of my time to visualizing reuse passage embeddings and topic modeling. Thus, I performed my individual tasks, while interacting with the group to reach the overarching goals throughout the project. The course was a perfect exercise of identifying the data-driven needs and finding the appropriate data science solution, thereby utilizing my technical knowledge to solve real-life problems. Furthermore, I have learned to work better with the data science tools and have a better sense of how their theoretical performance applies to the task at hand.

All in all, I believe that the course has been a tremendous learning experience. The topic was fascinating, requiring both technical knowledge and oftentimes an ample amount of creativity. In the process, I have had a lot of fun exploring the plethora of ways of approaching this problem, then implementing the solutions and seeing the results. I have learned not only data science skills, but also working together in a team to solve a problem. I think this experience will certainly come in handy for my career regardless of what I will do.

Ilari

The main parts that I worked on in this project were data exploration, data processing, passage clustering, and the reuse network analysis. I found that a significant portion of my work in this project went into exploring the data that we were given, and trying to clean it (the data was horribly messy). Exploring the different features of the data and trying to make sense of it all through statistics and visualizations. Although the majority of that work is not shown in our final report, the findings from my data exploration helped guide the conversation and research plans during our weekly meetings. From the work mentioned in our final report, I was responsible for the OCR correction with GPT, implementing the passage clustering, and performing the network analysis. This was a challenging project, and I think it was because we were free to choose our research direction. We were not really given any prompt other than “explore this reuse data”. So most of our group meetings were spent trying to find the next direction for our work when something fell flat. However, as a group we overcame this problem toward the end. Overall, I learnt how to work around stagnating progress and evolving goals. As an important mention, I also value face to face meetings much more now.

Yu An

I worked on the part “Framing in target texts and intertextuality analysis”. This course has brought me improvements in 1) project management skills, 2) data management skills, and 3) technical skills. In terms of project management, this is the first time I have proactively used a Kanban Board to track and share my progress with other group members. It turned out to be a strong motivator for my progress. Especially in the initial stage of our project, I remember we all felt uncertain about the research questions. With proactive tracking and sharing, the problems were broken into conquerable pieces, and here we are. I have found this to be the key to our successful project. Second, I have found that data management is crucial for productivity. This course is the first time I have applied data management skills to practice, which means numbering the notebooks I created, version-controlling only the scripts rather than the data, and writing documentation in a timely manner. This has improved my efficiency. I realize that the time spent on staying organized is much shorter than the time I waste on disentangling my previous work. The technical skills I have improved include speed reading data science papers for research topic inspiration, deriving a data science problem from given data, choosing an algorithm for the problem, exploring data qualitatively, and writing scripts that run on HPC systems. The best starter algorithm should be the simplest and most easily accessible one. The process of the data science project itself is iterative; a simple, although probably not accurate enough method, at least provides a baseline and can serve as a pointer. I see it as a guideline for my future projects. Lastly, I would say it is fun to explore the data case by case. I find it enjoyable to read Macbeth and reuse snippets. The reading experience is unique and distant from modern life.

Antti

I assisted the humanities students by finding out descriptive facts about the data and by providing visualizations. I learned that progress in analyzing data is not linear and often things do not go as smoothly as one would expect. Most weeks I would think that I could get much more done than I eventually would and I would spend much time on tasks that seemed simple beforehand. I also noticed this psychological phenomenon where, when another member of our team asked for me to perform analysis on the data, I would often feel compelled to promise to be done with the analysis unnecessarily soon and in future I will try to avoid this. More concretely, my programming skills before this course were quite limited and I think I gained some routine in analyzing and visualizing data with Python. Also, I had not really used Github before and, although I am still quite afraid of it, I am maybe a little less afraid than before.

Imama

I worked on several parts: Introduction, Target text reuses (Samuel Johnson and Richardson William) and examined the most frequent reuses by them, tried to interpret topic modeling conducted by Bruce (here I just tried to briefly approach from the humanities point of view and confirm the clustered topics as I did a lot of close reading while examining reuses). It was my first multidisciplinary project and I initially had doubts about collaborating and working with students from different disciplines. However, the experience turned out to be much smoother and more enjoyable than I anticipated. I gained valuable insights, particularly in navigating between close and distant reading approaches. I also gained more knowledge about how to interpret data provided by data science students, formulate better questions, and find compromises that accommodated everyone’s interests. The process reinforced my opinions of the importance of clean data and effective communication among team members. Timing and asking the right questions proved to be crucially important in most cases. Although there were moments when I felt hesitant to ask seemingly “stupid” questions, open discussions and regular meetings made the process more comfortable. Witnessing the remarkable outcome of our collaborative efforts makes me even happier and prouder now. It also made me realize the potential for digital humanities expertise in multidisciplinary projects, and ponder what else could be done and learned further in terms of the skills required. In the future, I hope to be able to take part also in the technical side and see what kinds of results could be achieved with a humanities background. Regarding the course workload, I think it was undeniably substantial compared to other regular courses, but I suppose this is what was expected given the nature of the project. I am grateful for the opportunity to work with responsible, punctual, and supportive group members. Overall, I learned the importance of prioritization, effective timing, understanding one’s responsibilities within the team, and exploring areas of personal interest, which made a significant difference in the project’s success.

Jimena

My role consisted of doing background reading and, once we had solid data, of trying to make sense out of it with what I had learnt from the literature. I wrote the Background together with Vilma, the “Versions of Macbeth” analysis together with Antti, and “Who reused Macbeth and how?” with Ilari and Imama (I specifically took care of the paragraphs on general reuse and of the subsection on Edward Byyshe). I feel like somewhere along the way I also acquired a more organizational role, and maybe, one of encouraging the group to meet and communicate more often.

My main takeaway from this project is that communication between data science and humanities is not only possible, but desirable. By using data, we can challenge ideas long held in humanities that may not be exactly precise. We can also add new layers to the information that we already have. In the beginning, bridging the gap between people of such diverse backgrounds was definitely not easy, but once we got the right questions and approach, the data kept coming, and that in turn produced more questions, etc. To me sometimes it was difficult to reach a healthy middle between trusting data blindly and totally mistrusting it. In the end I still think one’s human intuition about what makes sense is what should prevail, but I now feel even more motivated to learn about computer science and its intersections with language.

Regarding Shakespeare, I’ve become aware of the enormous impact that his works and their publication had for the publishing world as we know it today. This project also opened for me a whole Pandora box of the history of publishing and printing, where I feel like there’s still a lot to unravel and clarify. To summarize, this was a fun and challenging project, and it was refreshing to get out of the “pure” humanities environment where I usually dwell.

Vilma

My role in this project was to find relevant background information, analyze the results the data science students provided and write and edit the blog post. More specifically, I focused on the background section, analyzing the most reused passages and writing the conclusion. I worked on the background section together with Jimena and we also worked on editing the text. For me, the biggest challenges in this project were related to interpreting the data, finding relevant information for the background, trying to schedule meetings, and narrowing down the topic. While I have previously worked with large datasets, the data used in this project was quite different from what I am used to. Evaluating the results at different stages proved to be very interesting and helped me better understand the issues of working with a lot of data. Similarly, as our group was so big, figuring out what the scope of the project should be was quite challenging. As the instructions for this assignment were so unrestricted, choosing a topic and figuring out what is doable and manageable for this project was difficult. Setting smaller goals was also quite difficult as our topic kept evolving, but this helped me develop my project management skills. In addition, finding relevant and usable sources was quite challenging as there is so much information available on Shakespeare. This project helped me to develop my group work skills, and I also gained experience in working on a varied and quite extensive project, which will definitely help me with my MA thesis. In addition, it also gave insight into how multidisciplinary projects can be conducted, how the different fields can complement each other and how computational methods can be used for studying language and literature.

> Next section: References

University of Helsinki

Division of work & reflections on learning

Spring 2023: Traces of Macbeth: A Study of Text Reuse

Table of Contents

Division of work & reflections on learning

Bruce

Ilari

Yu An

Antti

Imama

Jimena

Vilma