Kubernetes Notes

Kubernetes cheatsheet and notes, wrote while learning and working on k8s related projects. Table of Contents: Basic Terms Manifests Expose A Service Tools Kubectl Helm Kubeadm Minikube Basic Terms some basic termnologies used by kubernetes cluster: physical cluster of physical machine. context: a group of access parameters, a cluster, a namespace, a user. »

Author image J

Pyspark

1. initilaztion 1. spark session from pyspark.sql import SparkSession spark = SparkSession \ .builder \ .appName("name") \ .config("spark.some.config.option", "some-value") \ .getOrCreate() 2. spark context 1. from spark session sc = spark.sparkContext 2. from spark context conf = SparkConf().setAppName("KMeans").setMaster("local[*]") sc = SparkContext(conf =conf) or sc = SparkContext.getOrCreate(conf) 2. paritition by index .mapPartitionsWithIndex(lambda idx, it: islice(it, 1, None) if idx == 0 else it)\ this get rid of the first line(rdd) in the file. »

Author image J

Analysis of a few models

1. OLS Ordinary Least Squares, the most common estimation method for linear models, It’s good and simple, however, it has lots of assumptions. 1. the regression model is linear in coefficients and the error term 2. random sampling of observations 3. No multi-collinearity (each variable is independent) 4. Error term is independent and identically distributed 5. all independent variables are uncorrelated with the error term In brexit referendum data, there are 110 variables, there would be too much computation. »

Author image J

Difference between Multi-Processing, Multi-threading and Coroutine

How is a computer program executed? A program needs at least one thread to run on. And a coroutine live in a thread, a thread lives in process, a process lives in core, a core lives in a CPU. Multi-Processing usually refer to many processes execute in parallel. Process is smallest resource management unit, different process share different resource. Multi-Threading usually refer to many threads execute concurrently, when there are idle cores, threads can use idle cores to run in parallel. »

Author image J

Sublime ToolKit

Useful sublime packages, steps to install: 1. open sublime 2. open command palette `(Ctrl+Shift+P for Windows/Linux, Cmd+Shift+P for Mac OS)` 3. type `install`Search for _Package Control:_ Install Package and hit Enter. 4 type `"package_name" `, find the package and hit enter 1.gitgutter very useful tool if you are working with a git repo 2.SumlineLinter & SumlineLinter-flake8 linter 3. KiteSublime to search documentations ``` > Written with StackEdit. »

Author image J

Python CheatSheet

This cheatsheet includes some very basic but easy to forget operations and some random notes. A good tutorial for beginner in Chinese 1. Decorator Syntax def decorator(func): def new_func(*args, **argkw): #add stuff print("Hello World") return func(*args, **argkw) return new_func @decorator def f(args): pass #run function f f() #result: #Hello World 2. Open file, read, write Open: f = open(“hello.text”, flag), flag: 'r' = read, 'b' = binary, 'w' = write read sing line: f. »

Author image J

Useful tool kit

Useful tools could be used in python development 1. track memory usage line by line https://pypi.org/project/memory_profiler/ 2. track run time for each line https://pypi.org/project/line_profiler/ 3. regular expression https://docs.python.org/3/library/re.html 4. google word to vector, trained model GoogleNews-vectors-negative300.bin https://code.google.com/archive/p/word2vec/ 5. trace execution for each line/function flow python -m trace --count -C . somefile.py ... https://docs.python.org/3.7/library/trace.html 6. get time analysis for a statement execution usage: python -m timeit 'stmt' # number = 10000 or: timeit. »

Author image J

CPP Notes

1. lvalue and rvalue you can get memory address for lvalue, you can’t for rvalue 2. lvalue reference Reference is cpp is like pointer in c, but not exactly same. Pass lvalue as parameter, you can just refer to its address, we know its memory address, we can access the data without deep copy, save both time and space. 3. rvalue reference Pass rvalue as parameter, you can’t refer to its address(for example, when a function returns, a heap object allocated, but the variable holds the object’s heap address will be poped with the stackframe). »

Author image J

ML-DL Notes

Notes for machine learning and deep learning models, design and training techniques Mainly includes: CNN, RNN(LTSM) ensemble techniques activation functions 1. deep learning concept 1. LTSM, a type of RNN, most successful RNNs are LTSM RNNs 1. use a vector as cell state to record state 2. use three gates to erase, update, output the cell state. Example: Rakuten data challenge Character-level tokenization Ensembling: Bidirectional training 2. Ensemble learning Bagging(bootstrap aggregating): »

Author image J

NLTK

1. tokenize https://www.nltk.org/_modules/nltk/tokenize.html Support sentence, word, for 17 languages source code for sentence tokenizer: ``python def sent_tokenize(text, language='english'): """ Return a sentence-tokenized copy of *text*, using NLTK's recommended sentence tokenizer (currently :class:.PunktSentenceTokenizer` for the specified language). :param text: text to split into sentences :param language: the model name in the Punkt corpus """ tokenizer = load('tokenizers/punkt/{0}.pickle'.format(language)) return tokenizer.tokenize(text) word tokenizer: word_tokenize(text, language=‘english’, preserve_line=False) perserve_line == false, then call sentence tokenizer first, otherwise, don’t »

Author image J