Lightning Talk Digital Humanities Australasia 2018

Ghosterwriting problem of Yasunari Kawabata's Novel Soranokatakana (40)

hao sun 1 , mingzhe jin 2
  1. Graduate School of Culture and Information Science, Doshisha University, Kyotanabe, Kyoto, Japan
  2. Faculty of Culture and Information Science, Doshisha University, Kyotanbe, Kyoto, Japan

This study presents a data-analysis-based method for verifying the actual author of Soranokatakana. Soranokatakana is a novel written in the name of Yasunari Kawabata, a famous novelist in Japanese literary history and the first Japanese winner of a Nobel Prize in Literature. However, Soranokatakana has long been suspected to be written by Kentaro Uchida. To the best of our knowledge, the ghostwriting problem of Soranokatakana has never been thoroughly discussed. In this study, we attempt to identify the author of Soranokatakana from the perspective of authorship attribution. Our method consists of three steps: establishing a corpus, extracting stylometric features, and applying machine learning algorithms. First, the general method of authorship attribution needs to build a corpus in which Yasunari Kawabata’s works and the possible ghost writer (Kentaro Uchida) are included. As there are very few works written by Kentaro Uchida, we introduce the novels of three famous novelists (Kan Kikuchi, Shusei Tokuda, and Richi Yokomitsu) who were contemporary of Yasunari Kawabata instead of Kentaro Uchida. We selected 20 novels of each novelist mentioned above from their collected writings to establish the corpus. Then, stylometric features for analysing the writing style were extracted from all the novels in the corpus. We chose some well-known high-performing stylometric features, which are semiotic features (comma position), syntax features (part-of-speech bi-grams), and clause patterns to quantify each novelist’s writing style. Finally, all the stylometric features were applied to machine learning algorithms to discuss whether the writing style used in Soranokatakana is different from Yasunari Kawabata’s. We applied two unsupervised machine learning algorithms, correspondence analysis (CA), hierarchical cluster analysis (HCA), and five supervised machine learning algorithms, adaptive boosting (AdaBoost), high-dimensional discriminant analysis, logistic model tree (LMT), random forest (RF), and support vector machine (SVM) to discuss the ghostwriting problem of Soranokatakana. The result of our quantitative analysis revealed the writing style used in Soranokatakana is close to that of Yasunari Kawabata rather than Kan Kikuchi, Shusei Tokuda, and Richi Yokomitsu. The result suggests it is highly possible Soranokatakana was written by Yasunari Kawabata and not the possible ghostwriter Kentaro Uchida.