Both the state-of-the-art of machine translation and the basic MT technology are continuously evolving, so practitioners need to understand and stay current with research to develop viable systems. If public MT can easily outperform home-built systems, there is little incentive for employees and partners to use in-house systems, and thus, we are likely to see rogue behavior from users who reject the in-house system or see users being forced to use sub-standard systems. This is especially true for localization use cases where the highest output quality is demanded.
Producing systems that consistently perform at the required levels demands deep expertise and broad experience. At a minimum, do-it-yourselfers need to have basic expertise in the various elements that surround machine learning technology.
While open sources do indeed provide access to the same algorithms, the essential skill in building MT systems is doing proper data analysis, data preparation and data cleansing to ensure that the algorithms learn from a sound quality foundation. The most skillful developers also understand the unique requirements of different use cases and can develop additional tools and processes to augment and enhance MT-related tasks. Often times the heavy lifting for many use cases is done outside and around the neural MT models.
Over the last few years, the understanding of what the “best NMT algorithms” are has changed regularly. A machine translation system that is deployed on an enterprise scale requires an “all in” long-term commitment or it will be doomed to be a failed experiment:
Building engineering teams that understand what research is most valid and relevant, and then upgrading and refreshing existing systems is a significant, ongoing and long-term investment.
Keeping up with the evolution in the research community requires constant experimentation and testing that most practitioners will find hard to justify.
Practitioners must know why and when to change as the technology evolves or risk being stuck with sub-optimal systems.
Open-source initiatives that emerge in academic environments, such as Moses, also face challenges. They often stagnate when the key students that were involved in setting up initial toolkits graduate and are hired away. The key research team may also move on to other research that has more academic stature and potential. These shifting priorities can force DIY MT practitioners to switch toolkits at great expense, both in terms of time and redundant resource expenditures.
To better understand the issue of a basic open-source MT toolkit in the face of enterprise MT capability requirements, consider why an organization would choose to use an enterprise-grade content management system (CMS) to setup a corporate website instead of a tool like WordPress. While both systems could be useful in helping the organization build and deploy a corporate web presence, enterprise CMS systems are likely to offer specialized capabilities that make them much more suitable for enterprise use.
As enterprises better understand the global communication, collaboration and content sharing imperatives of modern digital transformation initiatives, many of them see that MT is now a critical technology building block that enables better DX. However, there are many specialized requirements for MT systems, including data security and confidentiality, adaptation to different business use cases, and the ability to deploy systems in a broad range of enterprise use scenarios.
The issue of enterprise optimization is also an increasingly critical element in selecting such a core technology. MT is increasingly a mission-critical technology for a global business and requires the same care and attention as does the selection of enterprise CMS, email, and database systems.