The Cloud Application
team at Tgix built a new dictionary database and ETL — a substantial challenge for a large, inconsistent dataset (as the team themselves said, “English is a hot mess.”). Entries include lots of disparate parts: headword, pronunciation, parts of speech, related words, definitions, cross-references, example sentences, etymology, and more. The Tgix team spent weeks working closely with the M-W lexicographers to understand the existing structure and tagging, and then developed a database format that would provide both consistency and flexibility.
Tgix designed and implemented a comprehensive solution, consisting of: