Skip to content

Commit 1b07bbe

Browse files
committed
two success stories
1 parent 5272c88 commit 1b07bbe

File tree

3 files changed

+149
-79
lines changed

3 files changed

+149
-79
lines changed
Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,72 @@
1+
---
2+
title: Full-Stack OCaml Web Application
3+
logo: success-stories/ahrefs.svg
4+
card_logo: success-stories/white/ahrefs.svg
5+
background: /success-stories/ahrefs-bg.jpg
6+
theme: blue
7+
synopsis: "Ahrefs transitioned from PHP/jQuery to full-stack OCaml using Melange and React, eliminating team silos and enabling any engineer to contribute across their entire web application stack."
8+
url: https://ahrefs.com/
9+
priority: 2
10+
why_ocaml_reasons:
11+
- Type Safety
12+
- Unified Technology Stack
13+
- Team Efficiency
14+
- Integration with JavaScript Ecosystem
15+
- Shared Data Types
16+
- Developer Productivity
17+
- Code Reliability
18+
---
19+
20+
## Challenge
21+
22+
Ahrefs is a Singapore-based SaaS company that provides SEO tools and marketing intelligence powered by big data. Since 2011, they've built their business around OCaml, using it for web crawling and data processing to serve thousands of customers worldwide. Today, they're trusted by 44% of Fortune 500 companies and operate as a lean, self-funded organization focused on efficiency.
23+
24+
By 2017, Ahrefs had built a successful SEO tools business powered by OCaml on the backend, but they faced a bottleneck in web application development. Their frontend was built with PHP and jQuery while their data processing lived in OCaml. Every time frontend developers needed backend data, they had to coordinate with backend engineers to update the APIs.
25+
26+
Ahrefs wanted engineers to be productive across the entire stack, but the technology divide made this unnecessarily difficult. The JavaScript tooling used in 2017 for the frontend of the web application was lacking compared to today's TypeScript ecosystem. Ahrefs had already built years of expertise in OCaml. The question became: could they extend OCaml's benefits to the frontend?
27+
28+
The challenge was both technical and cultural. Could they transition the entire frontend team to a OCaml? Even when some of the engineers hadn't used a functional programming language before? Would the benefits of a unified stack outweigh the costs?
29+
30+
## Result
31+
32+
After adopting Reason/BuckleScript around 2017-2020 and migrating to **[Melange](https://melange.re/)** in 2023, Ahrefs has achieved a full stack web development setup around OCaml.
33+
34+
Now, any engineer in the company can contribute across the entire web application. Thanks to shared types between backend and frontend, coordination overhead is greatly reduced.
35+
36+
Frontend and backend stay in sync: When data structures change, type errors guide developers to update all affected code.
37+
38+
Today, their **5 billion daily frontend requests** are handled by the same OCaml codebase that powers their backend systems. The web application serving **44% of Fortune 500 companies** is built from **1.5 million lines of OCaml code** spanning both frontend and backend.
39+
40+
## Why OCaml
41+
42+
For Ahrefs, extending OCaml to the frontend wasn't about technological purity—it was about simplifying their business.
43+
44+
* **Shared types eliminate coordination overhead** - Using OCaml to express the shape of data exchanged between frontend and backend increases maintainability and simplifies development.
45+
* **Faster iteration cycles** - Type safety meant changes to data structures propagated safely throughout the entire application without runtime surprises, enabling rapid feature development.
46+
* **Melange bridges ecosystems** - Access to the JavaScript ecosystem (React components, npm packages) while maintaining OCaml's compile-time guarantees meant they didn't have to choose between type safety and ecosystem richness.
47+
48+
## Solution
49+
50+
Ahrefs built their full-stack solution around **[OCaml](https://ocaml.org/)** compiled to JavaScript via **[Melange](https://melange.re/)**, paired with **[React](https://react.dev/)** for the user interface.
51+
52+
The cornerstone of their architecture is **[ATD (Adjustable Type Definitions)](https://github.com/ahrefs/atd)**. Ahrefs developed ATD to generate shared types for their frontend and backend -- initially in BuckleScript.
53+
54+
Their frontend follows React patterns. Components are written in OCaml and compiled to JavaScript, with state management and data flow handled through React paradigms. Expressing all of this through OCaml ensures that data shapes match across the entire application.
55+
56+
Integration with their existing data infrastructure (**[ClickHouse](https://clickhouse.com/)**, **[MySQL](https://www.mysql.com/)**, **[Elasticsearch](https://www.elastic.co/)** on **[AWS](https://aws.amazon.com/)**) is seamless with frontend and backend sharing the same type definitions. Rather than maintaining separate API contracts, the database serves as the source of truth and data shapes are automatically reflected throughout the application.
57+
58+
## Lessons Learned
59+
60+
* **Shared types eliminate entire bug categories**: Automatic synchronization between frontend and backend data structures prevents integration issues that commonly plague web applications.
61+
* **Team learning pays off**: Transitioning frontend developers to OCaml requires investment, but tools like the **[Melange playground](https://melange.re/v5.0.0/playground)** make onboarding approachable. Productivity gains compound over time.
62+
* **Writing bindings is manageable with good resources**: Interfacing with the existing JavaScript ecosystem initially seemed daunting, but the **[official Melange documentation](https://melange.re/)** and **[community resources](https://github.com/melange-community/bindings)** provide clear guidance for common patterns.
63+
* **Gradual migration reduces risk**: Starting with small components or isolated features allows teams to build confidence before transitioning entire applications.
64+
65+
## Open Source
66+
67+
Ahrefs contributes actively to the full-stack OCaml ecosystem, sharing tools that benefit the broader community:
68+
69+
- **[Melange Recharts](https://github.com/ahrefs/melange-recharts):** Production-ready charting components for data visualization applications.
70+
- **[Melange Bindings](https://github.com/melange-community/bindings):** Community-driven repository of JavaScript library bindings.
71+
- **[Melange JSON PPX](https://github.com/ahrefs/melange-json-ppx):** Streamlined JSON handling for frontend applications.
72+
- **[Ahrefs DevKit](https://github.com/ahrefs/devkit):** Utilities and tools for building OCaml applications.
Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
---
2+
title: Petabyte-Scale Web Crawling and Data Processing
3+
logo: success-stories/ahrefs.svg
4+
card_logo: success-stories/white/ahrefs.svg
5+
background: /success-stories/ahrefs-bg.jpg
6+
theme: blue
7+
synopsis: "Ahrefs built the world's third-largest web crawler using OCaml, processing 500 billion requests daily and indexing petabytes of web data with a lean, efficient team."
8+
url: https://ahrefs.com/
9+
priority: 2
10+
why_ocaml_reasons:
11+
- Performance
12+
- Reliability
13+
- Expressiveness
14+
- Native Compilation
15+
- Industrial Strength
16+
- Scalability
17+
- Maintainability
18+
---
19+
20+
## Challenge
21+
22+
Ahrefs is a Singapore-based SaaS company that provides comprehensive SEO tools and marketing intelligence powered by big data. Since 2011, they've been crawling the entire web daily to maintain extensive databases of backlinks, keywords, and website analytics that help businesses with SEO strategy, competitor analysis, and content optimization. Today, they're trusted by 44% of Fortune 500 companies.
23+
24+
Building and operating a web crawler at internet scale presents extraordinary challenges. Ahrefs needed to index billions of web pages continuously, process petabytes of data in real-time, and turn this massive dataset into actionable insights for thousands of customers worldwide. The technical demands are staggering: their systems must handle **500 billion backend requests per day** while maintaining **over 100PB of storage**.
25+
26+
As a self-funded company, Ahrefs couldn't solve these challenges by throwing unlimited resources at the problem. They needed maximum efficiency from a small team—systems that could run reliably for months without intervention, code that could be understood and maintained by a lean engineering organization, and performance that could compete with tech giants despite having a fraction of their headcount.
27+
28+
The question wasn't just whether they could build a web-scale crawler, but whether they could do it sustainably with the constraints of a bootstrapped company.
29+
30+
## Result
31+
32+
Over a decade later, Ahrefs operates one of the world's most sophisticated web crawling operations, ranking as the **third-largest web crawler globally**. Their OCaml-powered systems process **500 billion requests daily**, maintain an index of **456.5 billion pages** across **267.6 million domains**, and update metrics for **300 million pages every 24 hours**.
33+
34+
This technical achievement translates directly to business success. Ahrefs has grown into a **$100M+ ARR company** with **150 employees** managing **4000+ servers**—all while maintaining their original philosophy of operational efficiency. They've become the sector leader in SEO tools, proving that the right technology choices can create sustainable competitive advantages.
35+
36+
The reliability of their OCaml systems is perhaps most impressive: programs written years ago continue running without surprises, requiring minimal maintenance from their engineering team. This "boring" reliability has allowed Ahrefs to focus engineering effort on building new features and capabilities rather than fighting infrastructure fires.
37+
38+
Their success demonstrates that OCaml can power not just technical excellence at massive scale, but sustainable business growth in highly competitive markets.
39+
40+
## Why OCaml
41+
Ahrefs chose OCaml because it solved their constraint: building world-class infrastructure with limited resources.
42+
43+
* **Expressiveness reduces team requirements** - OCaml allowed their small team to develop crawling and data processing systems with few lines of code, essential when you can't hire armies of engineers like big tech companies.
44+
* **Reliability minimizes operational overhead** - Systems run for months without surprises, crucial when you can't afford large operations teams to babysit infrastructure.
45+
* **Native performance handles web scale** - Compilation to native code provided the performance needed for processing 500 billion requests daily without requiring expensive hardware optimizations.
46+
* **Type safety prevents data disasters** - When processing petabytes of evolving web data, catching format issues at compile time rather than in production saves hours of debugging and prevents costly system failures.
47+
* **Language philosophy matches business model** - OCaml's expressiveness made it economical to create specialized, efficient systems tailored to their exact requirements rather than adapting bloated generic solutions.
48+
49+
## Solution
50+
51+
Ahrefs built their crawling infrastructure around OCaml's strengths, creating a distributed system that balances performance, reliability, and maintainability. **[OCaml](https://ocaml.org/)** serves as the primary language for all crawling and data processing systems, compiled natively for maximum performance across their **4000+ servers**.
52+
53+
The architecture treats data consistency as paramount. Using **[ATD (Adjustable Type Definitions)](https://github.com/ahrefs/atd)** to define shared data structures, they ensure type safety throughout their processing pipeline—from initial web crawling through to final data storage. This approach catches schema mismatches at compile time rather than runtime, crucial when processing billions of pages daily.
54+
55+
Their storage layer combines **[ClickHouse](https://clickhouse.com/)** for analytical workloads, **[MySQL](https://www.mysql.com/)** for transactional data, and **[Elasticsearch](https://www.elastic.co/)** for search functionality, all orchestrated on **[AWS](https://aws.amazon.com/)**. The key insight was designing these systems to work together seamlessly through shared OCaml types rather than complex API layers.
56+
57+
Ahrefs maintains their own libraries and frameworks rather than relying on generic solutions. This "build it ourselves" philosophy requires more initial investment but delivers systems perfectly tailored to web crawling demands. Their **1.5 million lines of OCaml code** represent years of accumulated domain expertise encoded in reliable, maintainable software.
58+
59+
The result is a unified system where improvements to crawling algorithms, data processing pipelines, or storage efficiency can be implemented quickly and deployed confidently across their entire infrastructure.
60+
61+
## Lessons Learned
62+
63+
Ahrefs' experience building web-scale infrastructure in OCaml offers valuable insights:
64+
65+
* **Reliability pays compound interest**: OCaml's "boring" stability means systems built years ago still run without surprises, freeing engineering time for new capabilities rather than maintenance.
66+
* **Types scale better than tests**: At petabyte scale, compile-time guarantees about data consistency prevent entire classes of runtime failures that would be catastrophic at this volume.
67+
* **Expressiveness enables specialization**: OCaml's high-level abstractions made it economical to build highly specialized systems rather than adapting generic frameworks to their unique requirements.
68+
* **Small teams can compete with giants**: The right language choice allowed Ahrefs to build infrastructure that competes with tech giants despite having a fraction of their resources.
69+
* **Performance and maintainability aren't mutually exclusive**: OCaml's combination of native compilation and high-level abstractions delivered both the performance needed for web scale and the clarity needed for long-term maintenance.
70+
71+
## Open Source
72+
73+
Ahrefs supports the OCaml ecosystem through contributions that benefit infrastructure and data processing applications:
74+
75+
- **[Ahrefs DevKit](https://github.com/ahrefs/devkit):** Tools and utilities for building distributed applications.
76+
- **[OCaml Community Tools](https://github.com/ocaml-community):** Contributions to widely used infrastructure tools like `ocurl` and `ocaml-mariadb`.
77+
- **[ATD](https://github.com/ahrefs/atd):** Schema definition language for cross-platform data serialization.

0 commit comments

Comments
 (0)