10x Faster Updates from a Database Upgrade for Merriam-Webster

10x Faster Updates from a Database Upgrade for Merriam-Webster

COMPANY

Merriam-Webster is the premier English language reference in North America, with a strong search footprint and large share of the market: over 100 million pageviews/month on its core, free website and over 3 billion pageviews/year across all of its sites and apps. In addition to dictionary and thesaurus resources, Merriam-Webster features language content, word games, learning tools, a podcast, and a highly engaging Twitter presence.

CHALLENGE

Dated product and development infrastructure was blocking innovation, speed, and flexibility — especially the use of print-centric XML files as the basis for the online dictionary.

As the company transitioned to digital-first along with their AWS migration, they wanted to be able to make and deploy changes quickly as well as build new products and features from their nearly 200-year trove of language data.

SOLUTION

The Cloud Application team at Tgix built a new dictionary database and ETL — a substantial challenge for a large, inconsistent dataset (as the team themselves said, “English is a hot mess.”). Entries include lots of disparate parts: headword, pronunciation, parts of speech, related words, definitions, cross-references, example sentences, etymology, and more. The Tgix team spent weeks working closely with the M-W lexicographers to understand the existing structure and tagging, and then developed a database format that would provide both consistency and flexibility.
Tgix designed and implemented a comprehensive solution, consisting of:
 
  • A flexible dictionary database that supported all the intricacies and vagaries of dictionary data
  • Complex ETL processes for extracting, transforming, and loading dictionary data from the legacy XML files into the new dictionary database
  • A comprehensive set of web services for on-the-fly querying of the dictionary database by consumer-facing web and mobile apps

RESULTS

10x faster updates and deployments saved hours of key engineering time that was shifted to new product development, accelerating innovation and growth. This key infrastructure update also enabled Merriam-Webster’s historically unprecedented response to adding words related to the Covid-19 pandemic in March 2020. As Slate reported, “Recent upgrades in Merriam’s data-processing system had shaved the time needed to add new entries from weeks to hours.”

Technologies Used

  • Core AWS (EC2, ALB, WAF, S3, Elasticache Redis, Route 53, ACM, CloudWatch)
  • MongoDB
  • Java / JDK
  • Spring
  • Tomcat
  • Maven

See Our Other Work For Merriam-Webster

If you’re dealing with complex infrastructure, security requirements, deployment speeds, or looking for cost efficiencies, contact us today for a no-obligation brainstorm.

Contact us today!