An overview of Natural Language Processing and Linguistics

An overview of Natural Language Processing and Linguistics

Human language

A human language is a symbolic signaling system. Most words are just symbols for an extra-linguistic entity: the word is a signifier that maps to a signified (idea or thing).

The symbols of language can be encoded like voice, gesture, writing, etc via continuous signals to the brain. Thus exploring the continuous encoding signals can be psychology or cognitive problems (like "What is thought?", even the fundational challenge: Turing Test ).

The core questions in human language are that problems, architectures, cognitive science, and the details of human language, how it is learned, processed, and how it changes .

About Languages

  • phonetics
  • phonology - the study of the sound patterns of human languages

    • word stress - vowels in unstressed syllables are pronounced as schwa /ə/
    • To produce a stressed syllable, one may change the pitch, make the syllable louder, or make it longer
    • Intonation may reflect syntactic or semantic differences
  • morphology - rules of word formation

    • Morphemes - the mini units of meaning
  • syntax
  • semantics - the linguistic meaning

    • lexical semantics
  • pragmatics - how context affects meanings

Word vectors

  • produce dense vector representations based on the context of words
  • Count-based like TF-IDF
  • Distributed Representations

Word embeddings

  • Word semantic meaning
    • hypernyms (is-a) relationships and synonym sets (like wordnet)
    • word vectors encode valuable semantic information. For example, Word2Vec Model knows a word from its neighbors, or vice versa, which relies on the linguistic hypothesis, distributional similarity (similar words have similar context).

Language Modeling - a probabilistic model of word sequences

  • N-gram
  • Finite automata and RNN
  • LSTM Networks

Parsing and tree structure

reference