1. tokenize
https://www.nltk.org/_modules/nltk/tokenize.html
Support sentence, word, for 17 languages
source code for sentence tokenizer:
``python
def sent_tokenize(text, language='english'):
    """
    Return a sentence-tokenized copy of *text*,
    using NLTK's recommended sentence tokenizer
    (currently :class:.PunktSentenceTokenizer`
    for the specified language).
:param text: text to split into sentences
:param language: the model name in the Punkt corpus
"""
tokenizer = load('tokenizers/punkt/{0}.pickle'.format(language))
return tokenizer.tokenize(text)
word tokenizer:
word_tokenize(text, language=‘english’, preserve_line=False)
perserve_line == false, then call sentence tokenizer first, otherwise, don’t