Logo

MERRIAM-

WEBSTER

CASE STUDY

Dictionary Database

COMPANY

Merriam-Webster is the premier English language reference in North America, with a strong search footprint and large share of the market: over 100 million pageviews/month on its core, free website and over 3 billion pageviews/year across all of its sites and apps. In addition to dictionary and thesaurus resources, Merriam-Webster features language content, word games, learning tools, a podcast, and a highly engaging Twitter presence.

Technologies Used

Core AWS (EC2, ALB, WAF, S3, Elasticache Redis, Route 53, ACM, CloudWatch)
MongoDB
Java / JDK
Spring
Tomcat
Maven

CHALLENGE

Dated product and development infrastructure was blocking innovation, speed, and flexibility — especially the use of print-centric XML files as the basis for the online dictionary.

As the company transitioned to digital-first along with their AWS migration, they wanted to be able to make and deploy changes quickly as well as build new products and features from their nearly 200-year trove of language data.

See Our Other Work For Merriam-Webster

SOLUTION

The Cloud Application team at Tgix built a new dictionary database and ETL — a substantial challenge for a large, inconsistent dataset (as the team themselves said, “English is a hot mess.”). Entries include lots of disparate parts: headword, pronunciation, parts of speech, related words, definitions, cross-references, example sentences, etymology, and more. The Tgix team spent weeks working closely with the M-W lexicographers to understand the existing structure and tagging, and then developed a database format that would provide both consistency and flexibility.
Tgix designed and implemented a comprehensive solution, consisting of:
A flexible dictionary database that supported all the intricacies and vagaries of dictionary data
Complex ETL processes for extracting, transforming, and loading dictionary data from the legacy XML files into the new dictionary database
A comprehensive set of web services for on-the-fly querying of the dictionary database by consumer-facing web and mobile apps.

RESULTS

10x faster updates and deployments saved hours of key engineering time that was shifted to new product development, accelerating innovation and growth. This key infrastructure update also enabled Merriam-Webster’s historically unprecedented response to adding words related to the Covid-19 pandemic in March 2020. As Slate reported, “Recent upgrades in Merriam’s data-processing system had shaved the time needed to add new entries from weeks to hours.”

TECHNOLOGIES USED

Core AWS (EC2, ALB, WAF, S3, Elasticache Redis, Route 53, ACM, CloudWatch)
MongoDB
Java / JDK
Spring
Tomcat
Maven

TECHNOLOGY USED

Jenkins
Terraform
Ansible
Gitlab

See Our Other Work For Merriam-Webster

If you’re dealing with complex infrastructure, security requirements, deployment speeds, or looking for cost efficiencies, contact us today for a no-obligation brainstorm.

Contact us today!

Case Study Form

Tgix

© Copyright 2022 - Tgix - All Rights Reserved
crosschevron-down linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram