David's raw ML reference notes
Search
Search
Dark mode
Light mode
Explorer
00 Administrative
00 Unfiled
Approximate nearest neighbor search
Bedrock
Concept drift
Datadog
Elastic Beats
Feature store
Fluentd
Google Bigtable
Hybrid fanout
Kibana
Logstash
Loss functions
Mean functions
Monitoring across the ML stack
Open990 System design diagrams
Pointwise, pairwise, and listwise ranking
Prometheus
Python parallelism options for APIs
Redis
Retrieval systems playbook
Sagemaker Feature Store
Splunk
SurveyMonkey panel
Untitled
Latex Suite configuration
Projects
Text-to-speech for math
2024-05-30 First attempt
Readwise
(Kopec) Classic Computer Science Problems in Python
(La Rocca) Advanced Algorithms and Data Structures
(Raff) Inside Deep Learning - Math, Algorithms, Models
Alammar 2018
Kingma and Ba (2014)
LeCun, Bengio, and Hinton (2015)
Notes from "PyTorch autograd mechanics" (documentation)
Notes from "PyTorch modules" (documentation)
Nwankpa, et al. (2018)
01 Statistical (machine) learning (data science)
00 Literature notes
Ruder 2017
Rumelhart, Hinton, and Williams (1986)
01 Statistical (machine) learning (data science)
20 Tasks
00 Regression
Loss functions for regression
00 Mean squared error (MSE) loss ("regression loss")
Naradaya-Watson regression
01 Classification
01 Classification
Binary classification
Accuracy and precision
Accuracy vs F-scores
Area under the ROC curve (AUROC, AUC)
Binary classification metrics
Binary classification
F-beta score ("f-score")
F1 score
Fall-out (False-positive rate, FPR)
False discovery rate (FDR, precision error rate)
False positive rate (FPR, false alarm ratio, fall-out rate)
Positive and negative predictive value
Precision and recall
Receiver operating characteristic (ROC) curve
Sensitivity and specificity
True negative rate (TNR)
True positive rate (TPR)
Type I vs Type II error
Loss functions for classification
01 Cross-entropy loss
02 Binary cross-entropy loss
03 Softmax cross-entropy loss
04 Focal loss
05 Hinge loss, aka support vector machine (SVM) loss
Loss functions for classification
Multi-class classification
Confusion (error) matrix
Micro, macro, and weighted averaging
Multi-class classification metrics
Multi-class classification
Precision and recall in multi-class classification
02 Ranking
02 Ranking
Contrastive learning
Comparing Siamese, triplet, and two-tower networks
Contrastive learning
Contrastive vs triplet loss
Siamese neural network
The original "contrastive loss"
Triplet loss
Triplet neural network
Multi-stage ranking systems
Ranking metrics
01 Precision and recall at K
02 Average precision
03 Mean average precision (MAP)
04 r-precision
05 Cumulative gain (normalized, discounted) (NDCG)
06 Graded precision, see cumulative gain
07 Mean reciprocal rank (MRR)
08 Ranking metrics in practice
Ranking metrics
03 Time series
03 Time series
Sequence-to-sequence
Autoregression ("auto-regression")
30 Model classes
00 Shared (general)
7 Regularization
General strategies for regularization
00 Learning rate and regularization
01 L1 regularization (and LASSO)
02 L2 regularization
04 Data augmentation
Structural prior
10 Neural networks
00 Concepts
Backpropagation
Backpropagation of errors
Backpropagation through time (for RNNs)
Implementation of Rumelhart 1986 network
Cybenko's universal function approximation theorem
Deep learning
Feedforward neural network
Hidden (latent) state
Neural network
Neuron (neural network)
Perceptrons
Multi-layer perceptrons
Perceptron (disambiguation)
Perceptron (neuron)
Single-layer perceptron
Stochastic gradient descent justifies everything
Workflows
Pretrained neural network playbook
01 Layers
Fully connected (aka linear) layer
Max pooling
Residual connection (ResNet), aka skip connection
Softmax
02 Activation functions
02 Activation functions
03 Architectures
03 Architectures
Attention
Attention (neural networks)
Bahdanau, Cho, and Bengio (2014)
Chaudhari, et al. (2021)
Context vector (aka attention vector)
Cross-attention (aka Encoder-Decoder attention)
Dense attention
Additive (Bahdanau) attention
Dense attention
Dot-product (multiplicative) attention
Efficient attention
Scaled dot-product attention
Entropy of self-attention as a function of sequence length
Even practitioners struggle to understand attention
Masked self-attention
Multi-head attention
Sparse attention
LSH-based attention (Reformer attention)
Sparse attention
Deep mixture-of-experts
Encoder-decoder
Cho et al. (2014) encoder-decoder model
Encoder-decoder architecture
Recurrent neural network (RNN)
Recurrent neural network (RNN)
Transformer models
Annotated Transformer, The
00 Constants used in analysis of The Annotated Transformer
01 Implementation of scaled dot-product attention
02 Use of multi-head attention throughout the codebase
03 Implementation of multi-head attention
04 Why the (B x 1 x L) mask must be unsqueezed to (B x 1 x 1 x L)
05 Implementation of the transformer encoder
06 Implementation of the transformer decoder
07 Implementation of sublayer connection
08 Uses of masking in the encoder and the decoder
09 Implementation of the position-wise feedforward network
10 Implementation of positional encoding
11 Implementation of the embedding model
12 Scaffolding of the transformer model (encoder-decoder, generator, decoder)
13 Implementation of the transformer model factory (make_model)
Annotated Transformer, The
Vaswani et al. (2017) transformer
Self-attention
Transformer block
Vaswani transformer model
Vaswani, et al. (2017)
What limits transformer sequence length?
Two-tower neural network
04 Regularization for neural networks
Dropout
Gradient clipping
Layer normalization
Layer vs batch normalization
Naming of layer and batch normalizations
Stochastic gradient descent as a regularizer
Weight decay is equivalent to L2 regularization
20 Linear models
Linear regression
Logistic regression
40 Ensemble models
Mixture-of-Experts (MoE)
Regularization for ensembles
00 Most important forms of regularization for gradient boosting
01 Limiting number of trees
02 Limiting tree depth
03 Bootstrap sampling (parallel ensembles)
04 Minimum samples per leaf
05 Feature subsampling
06 Limiting number of leaf nodes
Example subsampling (loosely called "bagging")
40 Domains
Computer vision
Computer vision
ResNet (pretrained models)
Natural language processing (NLP)
0 Text features, see Feature Engineering - Text features
General NLP tasks
0 Text embedding, see Feature engineering - Text features - Text embeddings
Constituency vs dependency parsing
Named entity recognition (NER)
Part-of-speech (POS) tagging
Sentence segmentation finds sentence boundaries
Language modeling concepts
0 Specific models, see "NLP - NLP-specific models"
Beam search in autoregressive language models
Causal language modeling
Cross-entropy loss in language models
Decoding (token selection) strategies
Beam search (token decoding)
Decoding (token selection) strategies
Greedy decoding
Temperature sampling
Top-k sampling
Top-p (nucleus) sampling
Distributional hypothesis
Generative pre-training
How Transformers handle novel (unknown) tokens
Next-word prediction
Prompting strategies
Chain-of-thought (CoT) prompting
Prompting strategies
Special tokens in language models
Transformers, see "Transformer architectures"
Why do we need a start-of-sequence token?
NLP-specific transformers
Augmented transformers
Augmented transformers
Contextualized late interactions over BERT (ColBERT)
Contextualized late interactions over BERT (ColBERT)
Relating REALM, DPR, and RAG
Retrieval-augmented generation (RAG)
Retrieval-augmented generation (RAG)
Decoder-only transformers
Decoder-only transformers
Generative Pre-trained Model (GPT-1)
Generative Pre-trained Model (GPT-1)
Radford, et al. (2018)
ELMo vs BERT vs GPT
Encoder-decoder transformers
Reformer model
Kitaev, Kaiser, and Leskaya (2020)
Reformer model
Encoder-only transformers
BERT variants
BERT variants
Sentence BERT (SBERT, S-BERT)
Bidirectional encoder representations from transformers (BERT)
BERT embedding sequence structure
BERT pre-training and fine-tuning
Bidirectional encoder representations from transformers (BERT)
Classification token (CLS)
Masked language modeling (MLM)
Next-sentence prediction (NSP)
Encoder-only transformers
Multimodal models
BLIP-2
Bootstrapping Language-Image Pre-training (BLIP)
CLIP, BLIP, and BLIP-2
Contrastive image-language pre-training (CLIP)
Vaswani transformer, see Architectures
Natural language processing (NLP)
Recommendation systems
Collaborative filtering
Collaborative filtering through matrix factorization
Collaborative filtering using deep learning
Collaborative filtering
Neural Collaborative Filtering (NCF)
Recommendation systems
50 Feature engineering
0 Concepts
Design matrix and target matrix
Features and feature space
Sampling has replacement, subsampling does not
1 Dimensionality reduction
Dimensionality reduction
Embeddings
Johnson-Lindenstrauss lemma
Principal component analysis (PCA)
Random projection
Representation learning (learned embeddings)
3 Quantization
Binning, bucketing, cutting
K-means for quantization
Levels and codebook
Product quantization (PQ)
Quantile discretization
Quantization
Scalar quantization (SQ)
Truncation
Uniform quantization
Vector quantization (VQ)
4 Type conversion
Feature binarization coerces any data to boolean
9 Text features
Stemmers and lemmatizers
Text embeddings
Sentence and document embeddings
word2vec vs GloVe
word2vec
Token representations
Tokenization (tokenizer)
60 Training and optimization
Cross-validation
Cross-validation for time series
Cross-validation
Grouping in cross-validation
Leave-one-out (LOO) and leave-p-out (LPO) cross-validation
Shuffle-and-split cross-validation
Stratification in cross-validation
k-fold cross-validation
Gradient descent
Gradient descent
Gradient estimation
Batch gradient descent
Mini-batch gradient estimation
Stochastic gradient descent
Optimization ("optimizers")
00 See also "mathematics - optimization"
01 Vanilla gradient descent optimizer
02 Gradient descent with momentum
03 Nesterov accelerated gradients (NAG)
04 AdaGrad
05 Root-mean-squared propagation (RMSProp)
06 Adam optimizer
Optimization ("optimizers")
Hyperparameter tuning
Grid search
Hyperparameter tuning
Optuna, see Python libraries - Optuna
Randomized hyperparameter search
Successive halving
Learning rate
Cyclical learning rate
Early stopping
Learning rate scheduling
Learning rate
Loss landscape (manifold, surface)
Offline vs online metrics
Variance=overfitting, bias=underfitting
70 Governance and ethics
Bias in AI
The woman worked as a babysitter (Sheng et al, 2019)
02 Mathematics
00 Mathematics literature notes
Chaudhury 2024
Chaudhury 2024, ch. 2 (linear algebra)
Chaudhury 2024, ch. 3 (classifiers and vector calculus)
Chaudhury 2024, ch. 4 (linear algebraic tools for ML)
Chaudhury 2024, ch. 6 (Bayes, information theory)
Chaudhury 2024, ch. 7 (neural networks)
Chaudhury 2024, ch. 8 (training neural networks)
Chaudhury 2024, ch. 9 (loss, optimization, and regularization)
Chaudhury, et al. (2024)
01 Concepts
Complex conjugate
Dirac delta "function"
Heaviside step function
Kroenecker delta
02 Calculus and dynamical systems
Gradient of a function
Interpretation of eigenvalues and eigenvectors in ordinary differential equations
Minimizers and minima
Partial derivative notation is (ab)used for gradients in the neural network literature
Why the chain rule for derivatives works
03 Linear algebra
0 Concepts
Collinearity
Eigenvalues and eigenvectors
Linear combination
Linear transformation
Normal to a plane
Orthogonality
Quadratic form
Row- and column-major ordering (tensor vectorization)
Tensor
Vector space
Broadcast (algebra)
Matrices
Diagonal matrix
Orthogonal matrix
Positive (or negative) (semi-)definite matrix
Rotation matrix
Similar matrices
Singular matrix
Symmetric matrix
Unitary matrix
Matrix decompositions
Eigenvalue decomposition for a square matrix
Matrix decompositions
Matrix diagonalization
Singular value decomposition (SVD)
Operations
Batched matrix multiplication
Conjugate transpose of a matrix
Determinant of a (square) matrix
Dot product of two vectors
Element-wise (Hadamard) product
Frobenius inner product
Frobenius norm
Matrix inverse
Matrix multiplication (product)
Matrix transpose
Orthogonal projection
Outer product of two vectors
Projection (projection matrix)
Pseudo-inverse of a matrix (Moore-Penrose)
Spectral norm
Trace of a (square) matrix
Proofs
Frobenius product of A and B is the trace of A transpose B
Matrix for arbitrary rotation in N dimensions (Rodrigues' rotation formula)
Minimization (maximization) of a quadratic form
Transpose of a matrix product is the reversed product of the two transposes
04 Probability, statistics, and information
Aggregate statistics
Harmonic mean
Contingency table (crosstab)
Covariance matrix
Distance metrics
Chebyshev distance (L-infinity norm)
Cosine similarity
Curse of dimensionality
Distance metrics
Euclidean distance (L2 norm)
Hamming distance
Inner product (dot product) similarity
L-p norm (Minkowski distance)
Levenshtein distance
Mahalanobis distance
Manhattan (taxicab) distance (L1 norm)
Distribution notation in probability and information theory
Information theory
0 Information theory notation
Comparing distributions
Comparing distributions
Conditional entropy
Cross-entropy is less than or equal to the Shannon entropy of the source distribution
Cross-entropy
Jensen-Shannon (JS) divergence (JSD)
Kullback-Leibler (KL) divergence ("relative entropy")
Mutual information
Population stability index, (Jeffreys distance, PSI)
Response of JSD and PSI to a rare event
Wasserstein metric (Earth mover's distance, EMD)
Describing distributions
Describing distributions
Information content of a random event
Perplexity
Shannon entropy
Information theory
Series of Approximations to English
Shannon (1948)
Moments of a function
Gundersen 2020
Moment of a function
Moment-generating functions
05 Computer science
Finite state machine
03 Computer programming
01 Python
00 Literature notes
An Unbiased Evaluation of Environment Management and Packaging Tools
01 Python libraries
1 Comparisons
Applying element-wise functions to tensors
3 Interface
dotenv
jupytext
4 Data science and machine learning
Hugging Face
Hugging Face
accelerate
datasets
evaluate
sentence-transformers
transformers
Optuna
Pandas
Forcing Pandas to show all rows just once
Pandas
Split a dataframe by data type
PyTorch
00 Skeleton of a PyTorch script
00 Example PyTorch script overview
01 Data import and preprocessing (PyTorch)
02 Defining a custom PyTorch module
03 PyTorch training function
04 PyTorch test function
05 PyTorch train-test loop
01 PyTorch concepts
Batching in PyTorch
Composition of operations in PyTorch
PyTorch computational graph (autograd functions)
PyTorch transforms
Registering parameters in PyTorch
02 Key functions and classes
Dataset and DataLoader
PyTorch tensors
nn.Embedding vs nn.Linear
nn.Module
zero_grad method (Optimizer and Module)
03 Managing PyTorch
PyTorch installation
PyTorch
SciKit-Learn (SKL, SKLearn)
LabelEncoder is basically a simplified OrdinalEncoder
Process categorical and numerical variables separately
SciKit learn lumps hyperparameter tuning with cross-validation
SciKit-Learn (SKL) overview and reference
SciKit-Learn (SKL, SKLearn)
Worked examples (with matplotlib)
Classifier comparisons
Topic extraction with latent dirichlet allocation
5 Serialization and deserialization
markdownify
pillow (also Python Imaging Library, PIL)
01 Python
02 Python patterns
Decorator to convert an instance method to a class method
03 Python utilities
Anaconda
Jupyter (Lab)
02 Coding practices
Design patterns
Method object (Command)
Values and principles
Osterhout "Philosophy of Software Design"
03 Algorithms
Algorithms literature notes
Hash tables and junk drawers
Beam search
Best-first search
03 Computer programming
04 Data structures
Merkle tree
04 Networked systems
0 Book notes
(Kleppman) Designing Data-Intensive Systems
(Kleppman) Designing Data-Intensive Systems
Kleppman ch. 11 -- Stream processing
Kleppman ch. 2 -- Data Models and Query Languages
Kleppman ch. 3 -- Storage and Retrieval
Kleppman ch. 4 -- Encoding and Evolution
Kleppman ch. 5 -- Replication
Kleppman ch. 6 -- Partitioning
Kleppman ch. 8 -- Distributed systems
(Xu) System Design Inteview, vol. 1
Cloud computing
65.21 Amazon Web Services (AWS)
Data storage
Dynamo (DynamoDB)
KeySpace
65.22 Google Cloud Platform (GCP)
65.141 GCP networking
GCP Reference architectures
Dataflow client on M1 Mac
Computer networking
00 Computer networking literature notes
(Kurose and Ross) Computer Networking -- a Top-Down Approach, 6e
CS-340 Intro to Computer Networking
CS-340 Lecture 1 High-level overview of the Internet
CS-340 Lecture 10 Router internals
CS-340 Lecture 11 BGP routing
CS-340 Lecture 2 Introduction to Routing
CS-340 Lecture 3 HTTP and SMTP
CS-340 Lecture 4 Cookies, DNS
CS-340 Lecture 5 Reliable transport
CS-340 Lecture 6 TCP packets
CS-340 Lecture 7 TCP congestion control
CS-340 Lecture 8 IPv4 addressing
CS-340 Lecture 9 NAT and IPv6
Friedlander, et al. (2007)
Mockapetris and Dunlap (1988)
RFC 3833 (Threat analysis of the Domain Name System (DNS))
10 Computer network layers
65.112 Link layer (OSI layer 2)
Medium access control (MAC) address
65.113 Network layer (OSI layer 3)
IP propagation from local to remote
65.114 Transport layer (OSI layer 4)
Transport control protocol (TCP)
User datagram protocol (UDP)
65.116 Security "layer"
Secure socket layer (SSL), see TLS
Transport layer security (TLS)
65.117 Application layer (OSI layer 7)
Domain name service (DNS)
DNS lookup from the local host
DNS nameserver
DNS resolver
DNS root server
DNS zone
Domain name system (DNS)
Domain name
Top-level domain (TLD)
Hypertext transfer protocol (HTTP)
HTTP codes
301 and 308 Moved permanently
302 and 307 Found ("moved temporarily")
HTTP message (request or response)
HTTP vs HTTPS
Hypertext transfer protocol (HTTP)
Universal resource locator (URL)
7-layer, 5-layer, and 4-layer network models
40 Protocols and algorithms
Gossip protocol
90 Networking concepts
Bandwidth
Byzantine generals problem
Two generals problem
What happens when I navigate to a URL in my browser
Cloud provider networks, see Vendors
Distributed systems, see 65.3
Distributed systems
Distributed systems
Faults
Failure modes
Split-brain (distributed systems)
Fault detection
Heartbeat (timeout detection)
Timeouts for detecting node failure
Faults
Network faults
Network faults
Node failures
Byzantine fault
Types of node failures (faults) in distributed systems
Globally monotonic identifiers (IDs)
Partitioning (aka sharding)
Consistent hashing
Partition skew and hot spots (hot shards)
Partitioning (aka sharding)
Partitioning strategies for distributed systems
Rehashing (hash mod N)
Performance characteristics of distributed systems
0 Distributed system performance characteristics (FLAT CAD)
Accessibility (distributed systems)
Availability (uptime)
Durability (distributed systems)
Fault tolerance
Latency (distributed system)
Round-trip time (RTT)
Throughput
Remote Procedure Call (RPC)
Replication vs partitioning
Replication
Leaderless replication
Dual writes
Multi-leader replication
Handling write conflicts in multi-leader replication
Single-leader replication
Failover for leader failure in leader-based replication
Leader-based replication
Logical (row-based) log replication
Standing up new followers
Statement-based replication
Trigger-based replication
Write-ahead log (WAL)-based replication ("physical log replication")
Synchronous, asynchronous, and semi-synchronous replication
Shared-nothing system
System boundary (ingress and egress)
API gateway
Middleware
Rate limiter
05 Data engineering and information science
0 Concepts
Buffer
Information retrieval
Ledger-based (immutable) storage
Change data capture
Event sourcing (event log)
05 Data engineering and information science
Databases
Columnar relational databases
Apache Cassandra
Concepts
Anti-entropy process
CAP theorem
Command-query responsibility segregation (CQRS)
Integrity checking
Log compaction
Database indexes
Database indexes
Forward and inverted (file) index (file flat index)
Primary vs secondary index
Databases
Key-value databases
Key-value store
Vector databases, see vector search
Similarity search systems
Nearest-neighbor search (vector search)
Dense and sparse vector search
Literature notes
Pan, et al. (2024)
Sun 2020
Nearest-neighbor search (vector search)
Vector databases (VDBMS)
Vector databases (VDBMS)
Vector indexing (hashing, partitioning)
Data-dependent vs data-independent partitioning schemes
Table-based vector indexes
Learned partitioning of vectors (learning to hash, L2H)
Random partitioning of vectors
Spectral hashing of vectors
Table-based vector indexes
Tree-based vector indexes
Defeatist search
Principal component tree
Random projection tree
Tree-based vector indexes
Vector indexing (hashing, partitioning)
Vector search libraries
Vector search libraries
Similarity search systems
Text search platforms
Elasticsearch
Lucene
Solr
Text search platforms
Stream processing
Ephemeral (traditional) message brokers
Fanout
Message queue (broker)
Message topics
Persistent and ephemeral message brokers as databases
Persistent message brokers are based on partitioned logs
Persistent message brokers
Persistent message queues don't care if a consumer goes offline
Stream (data processing)
Stream joins
When stream consumers lag producers
06 Operations
DevOps
Fowler 2024
Information technology (IT)
Email
Migrating between IMAP providers
Information technology (IT)
Operating systems
MacOS
Permanently disable re-open apps
Restore Time Machine from NAS
Time machine
Restore Time Machine from NAS
Restore files from an orphaned Time Machine backup
Time machine
Windows
BitLocker
MLOps & LLMOps
ML on AWS
Comparing AWS options for ML model inference (deployment)
Comparing AWS options for ML model training
ML on AWS
Managed ML on AWS
MLOps & LLMOps
Model deployment
Model deployment
Model lifecycle management
Model lifecycle management
Seldon Core
Operations software
Infrastructure as Code (IaC)
OpenTofu (OpenTF) is Terraform with a better license
Personal computing software
Sioyek
Distinction between marks and bookmarks in Sioyek
08 Case studies
ML design
Design Spotify
Design questions
Hierarchical classification from text embeddings
Session-based recommendation
Video search
Visual search
System design
Group chat service
Rate limiter
Search autocomplete
Social news feed (Facebook, Twitter)
URL shortener
Universally unique ID (UUID)
09 Soft skills
Communication
Non-violent communication
Rules for dealing with difficult people
Facilitation
Discussion failure modes
Interviews
Coding interview (LeetCode)
0 LeetCode Log
Two-pointer problems
Interviews
Narratives
2024 job search
2024 sabbatical
Alt-text hackathon project narrative
End-to-end project narrative
Feature extraction narrative
High stakes discussion with manager absent
Open990 technical overview
Vertex Matching Engine narrative
Reasons for specific roles
EvolutionIQ staff MLE
Leadership
Venisa's management questions
Home
❯
05 Data engineering and information science
❯
Similarity search systems
❯
Nearest neighbor search (vector search)
❯
Vector databases (VDBMS)
Folder: 05-Data-engineering-and-information-science/Similarity-search-systems/Nearest-neighbor-search-(vector-search)/Vector-databases-(VDBMS)
1 item under this folder.
Feb 14, 2025
Vector databases (VDBMS)