The Twin-Project


	The Odense Twin Corpus The Odense Twin Corpus is one of the core components of the project. The corpus is intended both to substantially increase the available acquisition corpora for the Danish language, and to facilitate the testing of hypotheses relating to inter-individual differences, including gender and regional variety, and the influence upon them of the family speech milieu. The size of the corpus and the period over which it extends facilitates studying important questions concerning the acquisition of Danish as a first language in great detail. Data consists of recordings of the twins with their families, enabling a close study of input and its role in language acquisition. Finally, by studying twins it is possible to isolate factors for which it is otherwise difficult to control, particularly differences related to age and gender. Facts about the corpus * The Odense Twin Corpus is a longitudinal study of six pairs of twins and their families. * The twin families were found via the Danish Twin Register at Odense University Hospital. * The twins are systematically varied according to their gender and zygoticness, so that there is one twin pair in each of the following groups: Mono-zygotic: girl-girl, boy- boy; Di-zygotic: girl-girl, boy-boy, and two girl-boy pairs. * The parents have been selected so that different regional standards are represented within the families * The collection of data began in May 1999. * Data consist of audio- and video-recordings of the twins at home with their families along with parental reports(CDIs). * Status of data collection (February 14th, 2001): * recorded sessions to date - total: 108 * recordings that have been digitalized (for transcription) - total: 59 * sessions transcribed - total: 38 * Recordings are done with approximately 3-week intervals. * The twins were between 9 and 12 months of age (one pair 5 months) when the recordings began. * Recorded sessions include a seated meal- and a free-play-situation, where the twins play alone or together with (one of the) members of their family. * Each session is approximately 60 minutes long. * Both audio and video-recordings are digital (using DAT-recorder and digital video camera respectively). * Recordings are transcribed and coded using CLAN as part of the CHILDES system (The Child Language Data Exchange System). Transcripts are linked to the digitalized audio files. * It is the intention that the digitalized audio files will be made publicly accessible via the internet. * The Twin corpus is used as data in different ways in the sub-projects, e.g., Early Constructions and Frames at Dinnertime.