Automatically exported from
Go to file
tmikolov 402ae80561 aa
2013-08-01 19:36:25 +00:00
compute-accuracy.c aa 2013-08-01 19:36:25 +00:00 aa 2013-08-01 19:28:23 +00:00 aa 2013-08-01 19:28:42 +00:00 aa 2013-08-01 19:28:55 +00:00 aa 2013-08-01 19:29:23 +00:00 aa 2013-08-01 19:29:38 +00:00 aa 2013-08-01 19:30:05 +00:00
distance.c removed gets() function 2013-07-30 20:05:43 +00:00
LICENSE aa 2013-07-30 05:35:50 +00:00
makefile aa 2013-08-01 19:31:03 +00:00
questions-phrases.txt aa 2013-08-01 19:31:17 +00:00
questions-words.txt aa 2013-08-01 19:31:32 +00:00
README.txt added readme 2013-07-30 21:41:50 +00:00
word2phrase.c aa 2013-07-30 05:36:45 +00:00
word2vec.c aa 2013-08-01 19:33:54 +00:00
word-analogy.c demo code for comupting analogies with the word vectors 2013-07-31 23:00:30 +00:00

We provide an implementation of the Continuous Bag-of-Words (CBOW) and the Skip-gram model (SG).

Given a text corpus, the word2vec program learns a vector for every word using the Continuous
Bag-of-Words or the Skip-Gram model.  The user needs to specify the following:
 - desired vector dimensionality
 - the size of the context window for either the Skip-Gram or the Continuous Bag-of-Words model
 - Whether hierarchical sampling is used
 - Whether negative sampling is used, and if so, how many negative samples should be used
 - A threshold for downsampling frequent words 
 - Number of threads to use
 - Whether to save the vectors in a text format or a binary format

Thus the programs require a very modest number of parameter.  In particular,  learning rates
need not be selected. 

The file downloads a small (100MB) text corpus, and trains a 200-dimensional CBOW model
with a window of size 5, negative sampling with 5 negative samples, a downsampling of 1e-3, 12 threads, and binary files.

./word2vec -train text8 -output vectors.bin -cbow 1 -size 200 -window 5 -negative 5 -hs 0 -sample 1e-3 -threads 12 -binary 1

Then, to evaluate the fidelity of our vectors, we can run the command, which will run   
a battery of tests on the vectors to determine their fidelity.  The tests evaluate
the vectors' ability to perform linear analogies. 

./distance vectors.bin