Skip to content

(data) Update the Ahrefs success story and add a new story about the full stack ocaml web solution #2863

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 25 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
6b501fc
new Ahrefs success story
sabine Dec 12, 2024
7d6ad7b
fmt
sabine Dec 12, 2024
452d620
Update data/success_stories/ahrefs.md
sabine Dec 14, 2024
2623735
Update data/success_stories/ahrefs.md
sabine Dec 14, 2024
a5911b0
Update data/success_stories/ahrefs.md
sabine Dec 14, 2024
d9162ae
clarification
sabine Dec 14, 2024
5bf970d
Update data/success_stories/ahrefs.md
sabine Dec 14, 2024
4b9aa6d
Update data/success_stories/ahrefs.md
sabine Dec 14, 2024
3415a0e
be more vague on number of requests frontend/backend
sabine Dec 14, 2024
db3a119
devkit / bindings
sabine Dec 14, 2024
ab97a0c
Update data/success_stories/ahrefs.md
sabine Dec 14, 2024
0e89db4
Update src/ocamlorg_web/lib/redirection.ml
sabine Dec 14, 2024
c7989cd
rewrite taking into account feedback, reframe around always being an …
sabine Jun 18, 2025
29bcf6c
add relevant BuckleScript -> ReScript context
sabine Jun 18, 2025
5272c88
edits
sabine Jul 1, 2025
1b07bbe
two success stories
sabine Jul 1, 2025
7f13496
new image for full stack story
sabine Jul 1, 2025
9f99a9a
Update data/success_stories/ahrefs-full-stack-web.md
sabine Jul 4, 2025
acc98a6
addressing @davesnx review, thanks Dave
sabine Jul 4, 2025
5f20674
shorten list of why reasons
sabine Jul 4, 2025
c9229d1
remove redirect bc it's two stories
sabine Jul 11, 2025
5e731d2
redirect for title change of old ahrefs story
sabine Jul 11, 2025
519e7eb
Apply suggestions from code review @Khady
sabine Jul 25, 2025
b6b447e
Apply suggestions from code review
sabine Jul 25, 2025
68da518
editing
sabine Aug 12, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
69 changes: 69 additions & 0 deletions data/success_stories/ahrefs-full-stack-web.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
---
title: Full-Stack React Web Application With OCaml
logo: success-stories/ahrefs.svg
card_logo: success-stories/white/ahrefs.svg
background: /success-stories/ahrefs-full-stack-bg.jpg
theme: blue
synopsis: "Ahrefs transitioned from PHP/jQuery to full-stack OCaml using Melange and React, eliminating team silos and enabling any engineer to contribute across their entire web application stack."
url: https://ahrefs.com/
priority: 2
why_ocaml_reasons:
- Integration with JavaScript Ecosystem
- Shared Data Types
- Developer Productivity
- Code Reliability
---

## Challenge

[Ahrefs](https://ahrefs.com/) is a Singapore-based SaaS company that provides SEO tools and marketing intelligence powered by big data. Since 2011, they've built their business around OCaml, using it for web crawling and data processing to serve thousands of customers worldwide. Today, they're trusted by 44% of Fortune 500 companies and operate as a lean, self-funded organization focused on efficiency.

By 2017, Ahrefs had built a successful SEO tools business powered by OCaml on the backend, but they faced a bottleneck in web application development. Their frontend was built with PHP and jQuery while their data processing lived in OCaml. Every time frontend developers needed backend data, they had to coordinate with backend engineers to update the APIs.

Check failure on line 21 in data/success_stories/ahrefs-full-stack-web.md

View workflow job for this annotation

GitHub Actions / lint

Trailing spaces

data/success_stories/ahrefs-full-stack-web.md:21:358 MD009/no-trailing-spaces Trailing spaces [Expected: 0 or 2; Actual: 1] https://github.com/DavidAnson/markdownlint/blob/v0.38.0/doc/md009.md

The JavaScript tooling used in 2017 for the frontend of the web application was lacking compared to today's TypeScript ecosystem. With BuckleScript and Reason appearing around the same time, they saw an opportunity to use OCaml on both the backend and frontend of their web application.

The challenge was both technical and cultural. Could they transition the entire frontend team to learn OCaml? Even when some of the engineers hadn't used a functional programming language before? Would the benefits of a unified stack outweigh the costs?

## Result

After adopting Reason/BuckleScript around 2017-2020 and migrating to **[Melange](https://melange.re/)** in 2023, Ahrefs has achieved a full stack web development setup around OCaml.

Now, any engineer in the company can contribute across the entire web application. Thanks to shared types between backend and frontend, coordination overhead is greatly reduced.

Check failure on line 31 in data/success_stories/ahrefs-full-stack-web.md

View workflow job for this annotation

GitHub Actions / lint

Trailing spaces

data/success_stories/ahrefs-full-stack-web.md:31:178 MD009/no-trailing-spaces Trailing spaces [Expected: 0 or 2; Actual: 1] https://github.com/DavidAnson/markdownlint/blob/v0.38.0/doc/md009.md

Today, their public-facing web application belongs to the same OCaml codebase that powers their backend systems. The web application serving **44% of Fortune 500 companies** handles around **500 billion HTTP requests on the backend** and **5 billion HTTP requests on the frontend**, every day.

## Why OCaml

For Ahrefs, extending OCaml to the frontend wasn't about technological purity—it was about simplifying their business.

* **One mental model for the entire codebase** - Using a single programming language provides a simpler mental model for developer working on the codebase - no matter what team a developer is on, they can contribute across the entire stack.
* **Shared types eliminate coordination overhead** - Using OCaml to express the shape of data exchanged between frontend and backend increases maintainability and simplifies development. Frontend and backend stay in sync: When the database schema changes, type errors guide developers to update all affected code in the API and even web UI. Vice versa, when the web UI changes, OCaml's type checker guides developers to make related API or database changes.
* **Faster iteration cycles** - Type safety meant changes to data structures propagated safely throughout the entire application without runtime surprises, enabling rapid feature development.
* **Melange bridges ecosystems** - Access to the JavaScript ecosystem (React components, npm packages) while maintaining OCaml's compile-time guarantees meant they didn't have to choose between type safety and ecosystem richness.

## Solution

Ahrefs built their full-stack solution around **[OCaml](https://ocaml.org/)** compiled to JavaScript via **[Melange](https://melange.re/)**, paired with **[reason-react](https://github.com/reasonml/reason-react)** for the user interface.

The cornerstone of their architecture is **[ATD (Adjustable Type Definitions)](https://github.com/ahrefs/atd)**. Ahrefs developed ATD to generate shared types for their frontend and backend -- initially in BuckleScript.

Their frontend follows React patterns. Components are written in OCaml and compiled to JavaScript, with state management and data flow handled through React paradigms. Expressing all of this through OCaml ensures that data shapes match across the entire application.

Integration with their existing data infrastructure (**[ClickHouse](https://clickhouse.com/)**, **[MySQL](https://www.mysql.com/)**, **[Elasticsearch](https://www.elastic.co/)** on **[AWS](https://aws.amazon.com/)**) is seamless with frontend and backend sharing the same type definitions. Rather than maintaining separate API contracts, the database serves as the source of truth and data shapes are automatically reflected throughout the application.

## Lessons Learned

* **Shared types eliminate entire bug categories**: Automatic synchronization between frontend and backend data structures prevents integration issues that commonly plague web applications.
* **Team learning pays off**: Transitioning frontend developers to OCaml requires investment, but tools like the **[Melange playground](https://melange.re/v5.0.0/playground)** make onboarding approachable. Productivity gains compound over time.
* **Writing bindings is manageable with good resources**: Interfacing with the existing JavaScript ecosystem initially seemed daunting, but the **[official Melange documentation](https://melange.re/)** and **[community resources](https://github.com/melange-community/bindings)** provide clear guidance for common patterns.
* **Gradual migration reduces risk**: Starting with small components or isolated features allows teams to build confidence before transitioning entire applications.

## Open Source

Ahrefs contributes actively to the full-stack OCaml ecosystem, sharing tools that benefit the broader community:

- **[styled-ppx](https://github.com/davesnx/styled-ppx):** Type-safe styled components for ReScript, Melange and native with type-safe CSS

Check failure on line 65 in data/success_stories/ahrefs-full-stack-web.md

View workflow job for this annotation

GitHub Actions / lint

Trailing spaces

data/success_stories/ahrefs-full-stack-web.md:65:139 MD009/no-trailing-spaces Trailing spaces [Expected: 0 or 2; Actual: 1] https://github.com/DavidAnson/markdownlint/blob/v0.38.0/doc/md009.md

Check failure on line 65 in data/success_stories/ahrefs-full-stack-web.md

View workflow job for this annotation

GitHub Actions / lint

Unordered list style

data/success_stories/ahrefs-full-stack-web.md:65:1 MD004/ul-style Unordered list style [Expected: asterisk; Actual: dash] https://github.com/DavidAnson/markdownlint/blob/v0.38.0/doc/md004.md
- **[server-reason-react](https://github.com/ml-in-barcelona/server-reason-react):** Native implementation of React's Server-side rendering (SSR) and React Server Components (RSC) architecture for Reason

Check failure on line 66 in data/success_stories/ahrefs-full-stack-web.md

View workflow job for this annotation

GitHub Actions / lint

Trailing spaces

data/success_stories/ahrefs-full-stack-web.md:66:204 MD009/no-trailing-spaces Trailing spaces [Expected: 0 or 2; Actual: 1] https://github.com/DavidAnson/markdownlint/blob/v0.38.0/doc/md009.md

Check failure on line 66 in data/success_stories/ahrefs-full-stack-web.md

View workflow job for this annotation

GitHub Actions / lint

Unordered list style

data/success_stories/ahrefs-full-stack-web.md:66:1 MD004/ul-style Unordered list style [Expected: asterisk; Actual: dash] https://github.com/DavidAnson/markdownlint/blob/v0.38.0/doc/md004.md
- **[melange-recharts](https://github.com/ahrefs/melange-recharts):** Production-ready charting components for data visualization applications.

Check failure on line 67 in data/success_stories/ahrefs-full-stack-web.md

View workflow job for this annotation

GitHub Actions / lint

Unordered list style

data/success_stories/ahrefs-full-stack-web.md:67:1 MD004/ul-style Unordered list style [Expected: asterisk; Actual: dash] https://github.com/DavidAnson/markdownlint/blob/v0.38.0/doc/md004.md
- **[melange-json](https://github.com/melange-community/melange-json):** Streamlined JSON handling for frontend applications.

Check failure on line 68 in data/success_stories/ahrefs-full-stack-web.md

View workflow job for this annotation

GitHub Actions / lint

Unordered list style

data/success_stories/ahrefs-full-stack-web.md:68:1 MD004/ul-style Unordered list style [Expected: asterisk; Actual: dash] https://github.com/DavidAnson/markdownlint/blob/v0.38.0/doc/md004.md
- **[ocaml-mlx/mlx](https://github.com/ocaml-mlx/mlx):** OCaml `.mlx` syntax dialect which adds JSX syntax expressions

Check failure on line 69 in data/success_stories/ahrefs-full-stack-web.md

View workflow job for this annotation

GitHub Actions / lint

Trailing spaces

data/success_stories/ahrefs-full-stack-web.md:69:119 MD009/no-trailing-spaces Trailing spaces [Expected: 0 or 2; Actual: 1] https://github.com/DavidAnson/markdownlint/blob/v0.38.0/doc/md009.md

Check failure on line 69 in data/success_stories/ahrefs-full-stack-web.md

View workflow job for this annotation

GitHub Actions / lint

Unordered list style

data/success_stories/ahrefs-full-stack-web.md:69:1 MD004/ul-style Unordered list style [Expected: asterisk; Actual: dash] https://github.com/DavidAnson/markdownlint/blob/v0.38.0/doc/md004.md
56 changes: 56 additions & 0 deletions data/success_stories/ahrefs-petabyte-crawler.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
---
title: Petabyte-Scale Web Crawling and Data Processing
logo: success-stories/ahrefs.svg
card_logo: success-stories/white/ahrefs.svg
background: /success-stories/ahrefs-bg.jpg
theme: blue
synopsis: "Ahrefs built the world's third-largest web crawler using OCaml, indexing petabytes of web data with a lean, efficient team."
url: https://ahrefs.com/
priority: 2
why_ocaml_reasons:
- Performance
- Reliability
- Expressiveness
- Scalability
- Maintainability
---

## Challenge

[Ahrefs](https://ahrefs.com/) is a Singapore-based SaaS company that provides comprehensive SEO tools and marketing intelligence powered by big data. Since 2011, they've been crawling the entire web daily to maintain extensive databases of backlinks, keywords, and website analytics that help businesses with SEO strategy, competitor analysis, and content optimization. Today, they're trusted by 44% of Fortune 500 companies.

Building and operating a web crawler at internet scale presents extraordinary challenges. Ahrefs needs to index billions of web pages continuously, process petabytes of data in real-time, and turn this massive dataset into actionable insights for thousands of customers worldwide. The technical demands are staggering: their systems must handle **500 billion backend requests per day** while maintaining **over 100PB of storage**.

As a self-funded company, Ahrefs couldn't solve these challenges by throwing unlimited resources at the problem. They needed maximum efficiency from a small team — systems that could run reliably for months without intervention, code that could be understood and maintained by a lean engineering organization, and performance that could compete with tech giants despite having a fraction of their headcount.

The question wasn't just whether they could build a web-scale crawler, but whether they could do it sustainably with the constraints of a bootstrapped company.

## Result

Over a decade later, Ahrefs operates one of the world's most sophisticated web crawling operations. Their OCaml-powered systems maintains an index of **492.7 billion pages** across **500.4 million domains**.

This technical achievement translates directly to business success. Ahrefs has grown into a **$100M+ ARR company** with **150 employees** managing **4000+ servers**—all while maintaining their original philosophy of operational efficiency. They've become the sector leader in SEO tools, proving that the right technology choices can create sustainable competitive advantages.

The reliability of their OCaml systems is perhaps most impressive: programs written years ago continue running without surprises, requiring minimal maintenance from their engineering team. This "boring" reliability has allowed Ahrefs to focus engineering effort on building new features and capabilities rather than fighting infrastructure fires.

Their success demonstrates that OCaml can power not just technical excellence at massive scale, but sustainable business growth in highly competitive markets.

## Solution

Ahrefs built their crawling infrastructure around OCaml's strengths, creating a distributed system that balances performance, reliability, and maintainability. **[OCaml](https://ocaml.org/)** serves as the primary language for all crawling and data processing systems, compiled natively for maximum performance across their **4000+ servers**.

Their architecture treats data consistency as paramount. Defining shared data structures (using **[ATD (Adjustable Type Definitions)](https://github.com/ahrefs/atd)**, and now moving to [melange-json](https://github.com/melange-community/melange-json)), they ensure type safety throughout their processing pipeline — from initial web crawling to final data storage. This approach catches schema mismatches at compile time rather than at runtime, crucial when processing billions of pages daily.

Their storage layer combines **[ClickHouse](https://clickhouse.com/)**, **[MySQL](https://www.mysql.com/)**, **[Elasticsearch](https://www.elastic.co/)**. The key insight was designing these systems to work together seamlessly through shared OCaml types rather than complex API layers.

Ahrefs maintains their own libraries and frameworks rather than relying on generic solutions. This "build it ourselves" philosophy requires more initial investment but delivers systems perfectly tailored to web crawling demands. Their **1.5 million lines of OCaml code** represent years of accumulated domain expertise encoded in reliable, maintainable software.

The result is a unified system where improvements to crawling algorithms, data processing pipelines, or storage efficiency can be implemented quickly and deployed confidently across their entire infrastructure.

## Why OCaml

* **Low maintenance burden**: OCaml systems built years ago continue running without intervention, allowing engineers to focus on new development rather than troubleshooting production issues.
* **Static typing catches errors**: At petabyte scale, compile-time type checking prevents data format inconsistencies and runtime failures that would be expensive to debug in production environments processing large volumes of web data.
* **Language expressiveness reduces development time**: OCaml's abstractions enabled building domain-specific systems efficiently rather than adapting existing frameworks. Small teams could develop complex crawling and data processing systems with relatively few lines of code.
* **Performance**: Native compilation provides the throughput needed for processing billions of daily requests while maintaining code readability for long-term maintenance.
* **Cost-effective specialized tooling**: OCaml made it practical to build custom systems tailored to specific requirements rather than using general-purpose solutions, which aligned with their business constraints of limited engineering resources.
30 changes: 0 additions & 30 deletions data/success_stories/ahrefs.md

This file was deleted.

2 changes: 2 additions & 0 deletions src/ocamlorg_web/lib/redirection.ml
Original file line number Diff line number Diff line change
Expand Up @@ -252,6 +252,8 @@ let from_v2 =
("/docs/platform-users", Url.tool_page "platform-users");
("/docs/platform-roadmap", Url.tool_page "platform-roadmap");
("/docs/configuring-your-editor", Url.tutorial "set-up-editor");
( "/success-stories/peta-byte-scale-web-crawler",
Url.success_story "peta-byte-scale-web-crawling-and-data-processing" );
]

let make ?(permanent = false) t =
Expand Down
Loading