What is Weights & Biases?
Weights & Biases is a machine learning experiment tracking and MLOps platform that gives teams a single system of record for every training run — logging hyperparameters, metrics, system stats, model weights, and dataset versions automatically. Think of it as Git for your model training: every experiment is reproducible, every result is traceable, and every team member can see exactly what changed between runs. Founded in 2018, W&B grew to over 700,000 ML practitioners and tracked nearly 300 million hours of training experiments before being acquired by CoreWeave for $1.7 billion in May 2025. In 2026, the platform has expanded beyond classical model training into LLM observability and agent evaluation through its W&B Weave product, reflecting where ML engineering budgets are now being spent.
Key Takeaways
- W&B's two-line SDK integration works with PyTorch, TensorFlow, Hugging Face, JAX, and most ML frameworks — setup takes under 30 minutes.
- Tracked hours billing multiplies by GPU count: an 8-GPU job running for one hour costs 8 tracked hours, which surprises teams running large parallel sweeps.
- CoreWeave's $1.7B acquisition in 2025 makes W&B part of a vertically integrated GPU cloud stack — no longer a fully neutral, infrastructure-agnostic tool.
- W&B Weave, its LLM tracing and evaluation layer, has become the platform's primary growth area as teams shift focus from model training to agent and RAG system evaluation.
- W&B is the most cited experiment logging tool in NeurIPS and ICML papers, meaning ML researchers arrive at new jobs already fluent in it.
Key Features
W&B's strength is giving ML teams visibility into the one thing that's historically been opaque: what actually happened during training. Experiment Tracking automatically captures metrics, hyperparameters, system stats, and code diffs for every run, enabling side-by-side comparisons across hundreds of experiments. Sweeps orchestrate distributed hyperparameter optimization using Bayesian, random, or grid search across your compute cluster, visualizing the full search space in real time. Artifacts version-controls datasets, models, and evaluation results with full lineage — trace exactly which data and code produced any given model. The Model Registry provides a centralized, access-controlled staging area for promoting models through development, staging, and production, with audit logs required by many compliance frameworks. Reports are collaborative documents that embed live charts alongside text, letting ML teams share experiment results internally without screenshots going stale. W&B Weave is the newest major product: it traces LLM calls, logs prompt chains, and runs structured evaluations for RAG pipelines and agent systems.
W&B vs MLflow vs Neptune
The choice between these three tools comes down to where you sit on the control-vs-convenience spectrum. MLflow is free, open-source, and self-hostable — the right call for teams on the Databricks platform or those who refuse to pay SaaS fees for experiment tracking. It is operationally heavier and the UI lags behind W&B's, but zero vendor dependency is a real advantage. Neptune.ai is built for extreme scale: it claims up to 1,000x more throughput than W&B and handles comparisons across 100,000+ runs with millions of data points, making it the pick for large research organizations that run massive parallel sweeps and find W&B dashboards slowing down. W&B wins on developer experience, documentation quality, and collaborative Reports — it is the tool most ML researchers already know, which lowers onboarding friction on new teams. If your team trains models at moderate scale and values fast setup and polished tooling, W&B is still the default.
Pricing and the Tracked Hours Gotcha
W&B's free tier is genuinely useful: 5 model seats, 5 GB artifact storage, and 1 GB of monthly Weave ingestion — enough for solo practitioners and academic work. Pro costs $50 per user per month and adds 500 tracked hours and 100 GB of storage. Enterprise runs $315–$400 per seat per month, negotiated with sales, and adds SSO/SAML, audit logs, HIPAA compliance, and unlimited tracked hours. The billing model's key trap is that tracked hours scale with GPU count, not wall-clock time. Running an 8-GPU distributed job for one day doesn't cost one day of tracked hours — it costs eight. Teams running large parallel sweeps on powerful GPU clusters have reported their tracked hours growing at multiples of expected real-time. This is worth modeling carefully before committing to a Pro contract, and it is a meaningful reason some teams switch to self-hosted MLflow at scale.
The CoreWeave Acquisition and What It Means
W&B spent its first seven years as a neutral observability layer — it worked equally well whether you trained on AWS, GCP, Azure, or on-premises. The CoreWeave acquisition changes that positioning. CoreWeave is a GPU cloud provider that competes directly with AWS and GCP for AI compute workloads, and W&B is now owned by one of those competitors. CoreWeave has committed publicly to maintaining interoperability, and the first post-acquisition products — Mission Control compute integration and W&B Inference — are positioned as additive rather than restrictive. But the strategic incentive to favor CoreWeave infrastructure exists, and enterprise teams evaluating long-term platform lock-in should weigh this. On the upside, the integration between W&B's experiment tracking and CoreWeave's GPU clusters is now tighter than any competing combination, which is a real productivity gain for teams that are already CoreWeave customers.
Who Hires for W&B and Why
W&B expertise appears in job postings for ML engineers, research scientists, and MLOps engineers across nearly every sector doing serious model development — AI labs, Big Tech, pharma (Pfizer), automotive (Toyota, Volkswagen), and financial services. The tool is so widely adopted that listing it on a resume is more of a baseline expectation than a differentiator; what actually signals value is knowing how to design a clean experiment tracking setup, structure artifact lineage for reproducibility audits, and use Sweeps efficiently. Fractional ML consultants are often brought in specifically to improve an organization's experiment infrastructure — setting up W&B, enforcing logging conventions, and building dashboards that research leads can actually use. We see this framing increasingly in engagements where a company has a scrappy ML team running experiments but no visibility into what worked, why it worked, or how to reproduce it.
The Bottom Line
Weights & Biases remains the experiment tracking platform that most ML practitioners reach for first, earned through years of developer-focused design and the simple fact that most researchers already know it from academic work. The CoreWeave acquisition adds a strategic wrinkle for teams concerned about long-term vendor neutrality, but hasn't yet disrupted the platform's core value. For companies hiring through Pangea, W&B experience signals an ML engineer who cares about reproducibility and experimental rigor — someone who runs training as a disciplined, observable process rather than a black box.

