word2vec/README.txt

Tools for computing distributed representtion of words
------------------------------------------------------

We provide an implementation of the Continuous Bag-of-Words (CBOW) and the Skip-gram model (SG), as well as several demo scripts.

Given a text corpus, the word2vec tool learns a vector for every word in the vocabulary using the Continuous
Bag-of-Words or the Skip-Gram neural network architectures. The user should to specify the following:
 - desired vector dimensionality
 - the size of the context window for either the Skip-Gram or the Continuous Bag-of-Words model
 - training algorithm: hierarchical softmax and / or negative sampling
 - threshold for downsampling the frequent words 
 - number of threads to use
 - the format of the output word vector file (text or binary)

Usually, the other hyper-parameters such as the learning rate do not need to be tuned for different training sets. 

The script demo-word.sh downloads a small (100MB) text corpus from the web, and trains a small word vector model. After the training
is finished, the user can interactively explore the similarity of the words.

More information about the scripts is provided at https://code.google.com/p/word2vec/
removed some specific information from README, as I just changed the scritps 2013-08-01 22:12:56 +02:00			`Tools for computing distributed representtion of words`
			`------------------------------------------------------`
added readme 2013-07-30 23:41:50 +02:00
removed some specific information from README, as I just changed the scritps 2013-08-01 22:12:56 +02:00			`We provide an implementation of the Continuous Bag-of-Words (CBOW) and the Skip-gram model (SG), as well as several demo scripts.`
added readme 2013-07-30 23:41:50 +02:00
removed some specific information from README, as I just changed the scritps 2013-08-01 22:12:56 +02:00			`Given a text corpus, the word2vec tool learns a vector for every word in the vocabulary using the Continuous`
			`Bag-of-Words or the Skip-Gram neural network architectures. The user should to specify the following:`
added readme 2013-07-30 23:41:50 +02:00			`- desired vector dimensionality`
			`- the size of the context window for either the Skip-Gram or the Continuous Bag-of-Words model`
removed some specific information from README, as I just changed the scritps 2013-08-01 22:12:56 +02:00			`- training algorithm: hierarchical softmax and / or negative sampling`
			`- threshold for downsampling the frequent words`
			`- number of threads to use`
			`- the format of the output word vector file (text or binary)`
added readme 2013-07-30 23:41:50 +02:00
removed some specific information from README, as I just changed the scritps 2013-08-01 22:12:56 +02:00			`Usually, the other hyper-parameters such as the learning rate do not need to be tuned for different training sets.`
added readme 2013-07-30 23:41:50 +02:00
removed some specific information from README, as I just changed the scritps 2013-08-01 22:12:56 +02:00			`The script demo-word.sh downloads a small (100MB) text corpus from the web, and trains a small word vector model. After the training`
			`is finished, the user can interactively explore the similarity of the words.`
added readme 2013-07-30 23:41:50 +02:00
removed some specific information from README, as I just changed the scritps 2013-08-01 22:12:56 +02:00			`More information about the scripts is provided at https://code.google.com/p/word2vec/`
added readme 2013-07-30 23:41:50 +02:00