The Odense Twin Corpus

The Odense Twin Corpus is one of the core components of the project. The corpus is intended both to substantially increase the available acquisition corpora for the Danish language, and to facilitate the testing of hypotheses relating to inter-individual differences, including gender and regional variety, and the influence upon them of the family speech milieu. The size of the corpus and the period over which it extends facilitates studying important questions concerning the acquisition of Danish as a first language in great detail. Data consists of recordings of the twins with their families, enabling a close study of input and its role in language acquisition. Finally, by studying twins it is possible to isolate factors for which it is otherwise difficult to control, particularly differences related to age and gender.

Facts about the corpus

* The Odense Twin Corpus is a longitudinal study of six pairs of twins and their    families.
* The twin families were found via the Danish Twin Register at Odense University    Hospital.
* The twins are systematically varied according to their gender and zygoticness, so that    there is one twin pair in each of the following groups: Mono-zygotic: girl-girl, boy-   boy; Di-zygotic: girl-girl, boy-boy, and two girl-boy pairs.
* The parents have been selected so that different regional standards are represented    within the families
* The collection of data began in May 1999.
* Data consist of audio- and video-recordings of the twins at home with their families    along with parental reports(CDIs).
* Status of data collection (February 14th, 2001):
* recorded sessions to date - total: 108
* recordings that have been digitalized (for transcription) - total: 59
* sessions transcribed - total: 38


* Recordings are done with approximately 3-week intervals.
* The twins were between 9 and 12 months of age (one pair 5 months) when the    recordings began.
* Recorded sessions include a seated meal- and a free-play-situation, where the twins    play alone or together with (one of the) members of their family.
* Each session is approximately 60 minutes long.
* Both audio and video-recordings are digital (using DAT-recorder and digital video    camera respectively).
* Recordings are transcribed and coded using CLAN as part of the CHILDES system    (The Child Language Data Exchange System). Transcripts are linked to the    digitalized audio files.
* It is the intention that the digitalized audio files will be made publicly accessible via    the internet.
* The Twin corpus is used as data in different ways in the sub-projects, e.g., Early    Constructions and Frames at Dinnertime.