Skip to main content

The history of COBUILD

A new generation of dictionaries for learners of English

When the first COBUILD dictionary was published in 1987, it revolutionized dictionaries for learners, completely changing approaches to dictionary-writing, and leading to a new generation of Corpus-driven dictionaries and reference materials for English language learners. This dictionary was the first of a new generation of dictionaries that were based on real examples of English – the type of English that people speak and write every day.

Collins and the University of Birmingham, led by Professor John Sinclair, developed an electronic corpus in the 1980s, which is where these examples of English were taken from. This corpus became the largest collection of English language data in the world and COBUILD uses the Collins Corpus to analyze the way that people really use the language.
From the 1990s onwards the Corpus was also used as a basis for grammar and language reference titles as well as dictionaries.

The principles established by the COBUILD project and the Collins Corpus continue to be at the heart of all our publishing. This means that when you see COBUILD or ‘Powered by COBUILD’ on Collins’ materials, you can be sure that these are based on the most up-to-date information on English available, and so will help learners of English with the language they need to communicate effectively and accurately in English.

Who created COBUILD?

John Sinclair was Founding Editor-in-Chief of the first edition of the COBUILD Dictionary. An outstanding linguist and scholar, he was Professor of Modern English Language at the University of Birmingham for most of his career, and was one of the very first modern Corpus linguists. He led the COBUILD project in lexical computing, funded by Collins, that revolutionized lexicography in the 1980s and resulted in the creation of the largest Corpus of English language texts in the world.
Professor Sinclair personally oversaw the creation of this very first electronic Corpus, and was instrumental in developing the tools needed to analyze the data. Having Corpus data allowed Professor Sinclair and his team to find out how people really use the English language and to develop new ways of structuring dictionary entries.
For example, frequency information allowed the team to rank senses by importance and usefulness to the learner (thus the most common meaning should be put first). The Corpus also highlights collocates (the words which go together), information which had only been sketchily covered in previous dictionaries. Under his guidance, Professor Sinclair’s team also developed a full-sentence defining style, which not only gave the user the sense of a word, but showed that word in grammatical context.
Professor Sinclair worked on the Collins COBUILD range of titles until his retirement, when he moved to Florence, Italy and became president of the Tuscan Word Centre, an association devoted to promoting the scientific study of language. He remained interested in dictionaries until his death, and the Collins COBUILD range of dictionaries remains a testament to his revolutionary approach to lexicography and English language learning.

The Collins Corpus

What’s in the Collins Corpus?

The Collins Corpus is an analytical database of English with over 20 billion words. It contains written material from websites, newspapers, magazines and books published around the world, and spoken material from radio, TV and everyday conversations. New data is fed into the Corpus every month, to help the Collins dictionary editors identify new words and meanings from the moment they are first used.

What does the Corpus tell us?

All COBUILD dictionaries are based on the information we find in the Collins Corpus. The full Corpus contains 4.5 billion words. The Bank of English™ is a subset of 650 million words from a carefully chosen selection of sources, to give a balanced and accurate reflection of English as it is used today. Our lexicographers use the Bank of English™ every day, and they use the full Collins Corpus to check more widely for extra information.
Because the Collins Corpus is so large, we can look at lots of examples of how people really use the words. The data tells us how words are used, what they mean, which words are used together, and how often words are used. This information on frequency helps us decide which words to include in the COBUILD dictionaries.

How is the Corpus used?

When a dictionary editor wants to add a new word to COBUILD, they search the Corpus for every example of the word. The word appears on the computer screen in a long list of sentences and the editors can arrange the lines in different ways depending on what they want to look at. They can then analyze meaning and usage, frequency, and collocation of the word in question.

In addition, all of the examples in COBUILD dictionaries are examples of real English, taken from the Corpus. The Collins Corpus lies at the heart of our publishing for learners of English and you can be confident that COBUILD will show you what you need to know to be able to communicate easily and accurately in English.

Collins Technical Corpus

The Collins Technical Corpus forms part of the Collins Corpus and is made up of academic and professional journal articles from a variety of subject fields, including science and humanities.

COBUILD Collocations

COBUILD Collocations are examples of words that are often used together. There are number of different types of collocations. For example, a collocation can consist of a verb and a noun (acceptachallenge), an adjective and a noun (aminorcomplaint) or a noun and a noun (anelectioncampaign). The words in green are called the collocates of the words in blue.

A corpus is very useful for finding the most common and significant collocations in a language. At Collins, we are using the corpus and a combination of linguistic algorithms and lexicographic expertise to create lists of collocates for many words and to find examples that illustrate each one.

Why are collocations useful?

Collocations are useful because they provide natural sounding ‘chunks’ of language for your speech and writing. It is much more common, for example, to say ‘commit crime’ than ‘do a crime’ or to talk about ‘a damaging effect’ than ‘a bad effect’.

Whenever you are tempted to use a common word like ‘small’, ‘important’, or ‘say’ to modify another word, check the COBUILD Collocations panel. There may be a more suitable or interesting word to use – a niggling doubt, apivotalmomentvoiceanopinion.

Collocations can also give you a range of vocabulary to talk about a particular subject. Imagine you are doing a piece of writing about a crisis. You could talk about whatcaused,provoked, or triggered thecrisisor how someonehandled,defused, or managed toavoidit. If it is getting worse, you could say that it is aworsening ordeepeningcrisis, or if it has not yet happened, that it is aloomingcrisis. How bad is the crisis? Is it amajor crisis, afull-blowncrisis, or aglobalcrisis? You might also want to mention the type of crisis. Is it afinancialcrisis, apoliticalcrisis,aneconomiccrisis, or is someone having anexistentialcrisis?

The possibilities are endless…