An overview of Natural Language Processing and Linguistics
An overview of Natural Language Processing and Linguistics
Human language
A human language is a symbolic signaling system. Most words are just symbols for an extra-linguistic entity: the word is a signifier that maps to a signified (idea or thing).
The symbols of language can be encoded like voice, gesture, writing, etc via continuous signals to the brain. Thus exploring the continuous encoding signals can be psychology or cognitive problems (like "What is thought?", even the fundational challenge: Turing Test ).
The core questions in human language are that problems, architectures, cognitive science, and the details of human language, how it is learned, processed, and how it changes .
About Languages
- phonetics
-
phonology - the study of the sound patterns of human languages
- word stress - vowels in unstressed syllables are pronounced as schwa /ə/
- To produce a stressed syllable, one may change the pitch, make the syllable louder, or make it longer
- Intonation may reflect syntactic or semantic differences
-
morphology - rules of word formation
- Morphemes - the mini units of meaning
- syntax
-
semantics - the linguistic meaning
-
pragmatics - how context affects meanings
Word vectors
- produce dense vector representations based on the context of words
- Count-based like TF-IDF
- Distributed Representations
Word embeddings
- Word semantic meaning
- hypernyms (is-a) relationships and synonym sets (like wordnet)
- word vectors encode valuable semantic information. For example, Word2Vec Model knows a word from its neighbors, or vice versa, which relies on the linguistic hypothesis, distributional similarity (similar words have similar context).
Language Modeling - a probabilistic model of word sequences
- N-gram
- Finite automata and RNN
- LSTM Networks
Parsing and tree structure
reference