September 2025 – Insights into World Wide Web

Why Hugging Face Shows the “Wrong” Parameter Count for AWQ Models

Dr. Anjing Wang September 19, 2025 0 Comments

If you’ve loaded an AWQ-quantized model from Hugging Face (like Qwen2.5-VL-3B-Instruct-AWQ), you might have noticed something confusing: 👉 Hugging Face says the model has ~0.9B parameters, but the architecture is…

PyTorch Matrix Multiplication: matmul, mm, and @

Dr. Anjing Wang September 19, 2025 0 Comments

Matrix multiplication is one of the most fundamental operations in machine learning. In PyTorch, you’ll often see three different ways to do it: At first glance, they look interchangeable —…

LoRA: Low-Rank Adaptation Made Simple

Dr. Anjing Wang September 17, 2025 0 Comments

Large language models are huge — billions of parameters, often stored as massive square weight matrices like 4096 × 4096. Fine-tuning all of those parameters for a new task is…

Mixed Precision Training: Faster Deep Learning Without Losing Accuracy

Dr. Anjing Wang September 17, 2025 0 Comments

Training today’s deep learning models is resource-hungry. Models have billions of parameters, and every step requires trillions of floating-point operations. To make training feasible, researchers and engineers rely on mixed…

Understanding FP32, FP16, and BF16: Floating-Point Formats in Deep Learning

Dr. Anjing Wang September 17, 2025 0 Comments

Modern deep learning wouldn’t be possible without floating-point numbers. They’re the backbone of every matrix multiplication, activation, and gradient update. But as models grow larger and GPUs become more specialized,…

Demystifying Floating-Point Precision: Half, Single, and Double

Dr. Anjing Wang September 16, 2025 0 Comments

If you’ve ever written code in Python, CUDA, or TensorFlow, you’ve probably seen terms like float16, float32, or float64. They map directly to the IEEE-754 floating-point standard: But what do…