Insights into World Wide Web – My Humble Thoughts about Web Dev and Website Reviews

How Large Language Models Really Work: Next-Token Prediction at the Core

Dr. Anjing Wang October 7, 2025 0 Comments

If you peel away all the complexity of modern large language models (LLMs)—billions of parameters, reinforcement learning from human feedback, retrieval-augmented generation—the essence of how they work comes down to…

Understanding the Three Types of Transformers: Encoder, Decoder, and Encoder–Decoder

Dr. Anjing Wang October 7, 2025 0 Comments

The term Transformer has become almost synonymous with modern large language models (LLMs). But when people talk about “encoder-only,” “decoder-only,” or “encoder–decoder” architectures, they are drawing on terminology that predates…

Why Hugging Face Shows the “Wrong” Parameter Count for AWQ Models

Dr. Anjing Wang September 19, 2025 0 Comments

If you’ve loaded an AWQ-quantized model from Hugging Face (like Qwen2.5-VL-3B-Instruct-AWQ), you might have noticed something confusing: 👉 Hugging Face says the model has ~0.9B parameters, but the architecture is…

PyTorch Matrix Multiplication: matmul, mm, and @

Dr. Anjing Wang September 19, 2025 0 Comments

Matrix multiplication is one of the most fundamental operations in machine learning. In PyTorch, you’ll often see three different ways to do it: At first glance, they look interchangeable —…

LoRA: Low-Rank Adaptation Made Simple

Dr. Anjing Wang September 17, 2025 0 Comments

Large language models are huge — billions of parameters, often stored as massive square weight matrices like 4096 × 4096. Fine-tuning all of those parameters for a new task is…

Mixed Precision Training: Faster Deep Learning Without Losing Accuracy

Dr. Anjing Wang September 17, 2025 0 Comments

Training today’s deep learning models is resource-hungry. Models have billions of parameters, and every step requires trillions of floating-point operations. To make training feasible, researchers and engineers rely on mixed…

Understanding FP32, FP16, and BF16: Floating-Point Formats in Deep Learning

Dr. Anjing Wang September 17, 2025 0 Comments

Modern deep learning wouldn’t be possible without floating-point numbers. They’re the backbone of every matrix multiplication, activation, and gradient update. But as models grow larger and GPUs become more specialized,…

Demystifying Floating-Point Precision: Half, Single, and Double

Dr. Anjing Wang September 16, 2025 0 Comments

If you’ve ever written code in Python, CUDA, or TensorFlow, you’ve probably seen terms like float16, float32, or float64. They map directly to the IEEE-754 floating-point standard: But what do…

Causal Language Models vs. Conditional Generation Models: Key Differences and Real-World Examples

Dr. Anjing Wang February 7, 2025 0 Comments

Natural language generation has rapidly evolved with the rise of large language models, but one common point of confusion is distinguishing between causal language models (CLMs) and conditional generation models.…

Unlocking the Power of Conditional Generation in AI

Dr. Anjing Wang February 2, 2025 0 Comments

Artificial Intelligence (AI) has revolutionized how we interact with technology, from chatbots that answer questions to AI models that generate lifelike images and translate languages instantly. But behind many of…

Unveiling NVIDIA DGX: A Journey Through AI Supercomputing

Dr. Anjing Wang November 19, 2024 0 Comments

NVIDIA DGX (Deep GPU Xceleration) is synonymous with cutting-edge artificial intelligence (AI) infrastructure. Designed to accelerate AI research and applications, the DGX family of systems provides unparalleled computational power for…

How to Measure the Performance of OCR: Why BLEU Isn’t Always the Best Choice

Dr. Anjing Wang November 13, 2024 0 Comments

Optical Character Recognition (OCR) is a technology that converts images of text (such as scanned documents, photos, or screenshots) into machine-readable text. While OCR has come a long way, evaluating…

Understanding sklearn.metrics.accuracy_score and How to Calculate Accuracy Manually

Dr. Anjing Wang November 13, 2024 0 Comments

When evaluating machine learning models, accuracy is one of the most commonly used metrics for classification tasks. In this blog post, we’ll dive into the accuracy_score function provided by Scikit-Learn’s…

Supervised Fine-Tuning (SFT): How to Fine-Tune Your Model Like a Pro

Dr. Anjing Wang November 1, 2024 0 Comments

In the world of machine learning, pretrained models are like finding a treasure chest of knowledge. They save us hours, days, or even weeks of training time, allowing us to…

Deep Learning Model Precision: FP32, BF16, INT8 and INT4

Dr. Anjing Wang October 31, 2024 0 Comments

When training or deploying deep learning models, precision isn’t just about getting accurate predictions—it’s also about finding the right balance between performance, memory usage, and speed. Choosing the optimal precision…