Installing Moses: The Statistical Machine Translation Tool
What is machine translation?
Machine translation is the task for translating from one language lets say English to another language for an instance Japanese. Primarily there are two major techniques to achieve this task, one is Statistical Machine Translation(SMT) and the other is Neural Machine Translation(NMT). The first appraoch converts the translation task to a noisy channel
model while the second one uses a sequence-to-sequence
deeplearning method.
In this post, we will cover the statistical one and to be specific we will walk through the installation for one of the most widely used SMT toolkit: the mosesdecoder
.
Prerequisites
g++
git
subversion
automake
libtool
zlib1g-dev
libboost-all-dev
libbz2-dev
liblzma-dev
python-dev
graphviz
imagemagick
make
cmake
libgoogle-perftools-dev
autoconf
doxygen
p7zip
gawk
Step-1: Installing the prerequisites
Put all these required libraries inside a file requirements.txt
and install all of them using a single command
sudo apt-get install $(cat requirements.txt)
Step-2: Installation of an alignment tool
We will install mgiza++
alignment tool which is the multi-threaded version of giza++
.
Clone the mgiza++
repo
mkdir tools
cd tools
git clone https://github.com/moses-smt/mgiza.git
Building the repo
cd mgiza/mgizapp
cmake .
make
make install
Step-3: Language modelling toolkit installation
Mosesdecoder comes with KenLM
as the default language modelling tool. However, we will be installing a third party language modelling tool SRILM
.
First, change back to tools/
directory.
mkdir srilm
cd srilm
(This step is crucial since SRILM
expands in the current directory and not in a sub-directory)
Register and download the latest srilm tool from the srilm download page which is a .tgz
file and move it to the above directory which was just created.
tar -xzvf <srilm-version.tgz>
After this we need to set the SRILM
path in a Makefile
. Specifically, the main path of the srilm
should be pointed in here. The line to be modified looks like this
SRILM = /home/speech/stolcke/project/srilm/devel
which should be modified to
SRILM = <path/to/your/srilm/main/directory>
Then make using
make World
Step-4: Building boost
Change back to tools/
directory. For this installation, boost_1_64_0
version is used.
wget https://boostorg.jfrog.io/artifactory/main/release/1.64.0/source/boost_1_64_0.tar.gz
tar -xvzf boost_1_64_0.tar.gz
cd boost_1_64_0/
./bootstrap.sh
./b2 -j4 --prefix=$PWD --libdir=$PWD/lib64 --layout=system link=static install || echo FAILURE
NOTE: -j4
is for multiprocessing purpose where 4
is the number of simulataneous tasks.
Step-5: Finally compilation of the moses
tool with the language model(SRILM)
From the tools/
directory
git clone https://github.com/moses-smt/mosesdecoder.git
cd mosesdecoder
./bjam --with-srilm=<path/to/Srilm> --with-boost=<path/to/boost> -j4
If the building and linkink was successful, then a SUCCESS
prompt will be shown at the end.
To sum up, the directory tree should look like this:
|-tools
| |-srilm
| |-boost
| |-mosesdecoder
In the next post, we will walk through the training steps for an SMT system, right from data acquisition, data processing, training to model evaluation.
Leave a comment