-
Notifications
You must be signed in to change notification settings - Fork 359
(data) Update the Ahrefs success story and add a new story about the full stack ocaml web solution #2863
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
(data) Update the Ahrefs success story and add a new story about the full stack ocaml web solution #2863
Changes from all commits
6b501fc
7d6ad7b
452d620
2623735
a5911b0
d9162ae
5bf970d
4b9aa6d
3415a0e
db3a119
ab97a0c
0e89db4
c7989cd
29bcf6c
5272c88
1b07bbe
7f13496
9f99a9a
acc98a6
5f20674
c9229d1
5e731d2
519e7eb
b6b447e
68da518
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,69 @@ | ||
--- | ||
title: Full-Stack React Web Application With OCaml | ||
logo: success-stories/ahrefs.svg | ||
card_logo: success-stories/white/ahrefs.svg | ||
background: /success-stories/ahrefs-full-stack-bg.jpg | ||
theme: blue | ||
synopsis: "Ahrefs transitioned from PHP/jQuery to full-stack OCaml using Melange and React, eliminating team silos and enabling any engineer to contribute across their entire web application stack." | ||
url: https://ahrefs.com/ | ||
priority: 2 | ||
why_ocaml_reasons: | ||
- Integration with JavaScript Ecosystem | ||
- Shared Data Types | ||
- Developer Productivity | ||
- Code Reliability | ||
--- | ||
|
||
## Challenge | ||
|
||
[Ahrefs](https://ahrefs.com/) is a Singapore-based SaaS company that provides SEO tools and marketing intelligence powered by big data. Since 2011, they've built their business around OCaml, using it for web crawling and data processing to serve thousands of customers worldwide. Today, they're trusted by 44% of Fortune 500 companies and operate as a lean, self-funded organization focused on efficiency. | ||
|
||
By 2017, Ahrefs had built a successful SEO tools business powered by OCaml on the backend, but they faced a bottleneck in web application development. Their frontend was built with PHP and jQuery while their data processing lived in OCaml. Every time frontend developers needed backend data, they had to coordinate with backend engineers to update the APIs. | ||
Check failure on line 21 in data/success_stories/ahrefs-full-stack-web.md
|
||
|
||
The JavaScript tooling used in 2017 for the frontend of the web application was lacking compared to today's TypeScript ecosystem. With BuckleScript and Reason appearing around the same time, they saw an opportunity to use OCaml on both the backend and frontend of their web application. | ||
|
||
The challenge was both technical and cultural. Could they transition the entire frontend team to learn OCaml? Even when some of the engineers hadn't used a functional programming language before? Would the benefits of a unified stack outweigh the costs? | ||
|
||
## Result | ||
|
||
After adopting Reason/BuckleScript around 2017-2020 and migrating to **[Melange](https://melange.re/)** in 2023, Ahrefs has achieved a full stack web development setup around OCaml. | ||
|
||
Now, any engineer in the company can contribute across the entire web application. Thanks to shared types between backend and frontend, coordination overhead is greatly reduced. | ||
Check failure on line 31 in data/success_stories/ahrefs-full-stack-web.md
|
||
|
||
Today, their public-facing web application belongs to the same OCaml codebase that powers their backend systems. The web application serving **44% of Fortune 500 companies** handles around **500 billion HTTP requests on the backend** and **5 billion HTTP requests on the frontend**, every day. | ||
|
||
## Why OCaml | ||
|
||
For Ahrefs, extending OCaml to the frontend wasn't about technological purity—it was about simplifying their business. | ||
|
||
* **One mental model for the entire codebase** - Using a single programming language provides a simpler mental model for developer working on the codebase - no matter what team a developer is on, they can contribute across the entire stack. | ||
* **Shared types eliminate coordination overhead** - Using OCaml to express the shape of data exchanged between frontend and backend increases maintainability and simplifies development. Frontend and backend stay in sync: When the database schema changes, type errors guide developers to update all affected code in the API and even web UI. Vice versa, when the web UI changes, OCaml's type checker guides developers to make related API or database changes. | ||
* **Faster iteration cycles** - Type safety meant changes to data structures propagated safely throughout the entire application without runtime surprises, enabling rapid feature development. | ||
* **Melange bridges ecosystems** - Access to the JavaScript ecosystem (React components, npm packages) while maintaining OCaml's compile-time guarantees meant they didn't have to choose between type safety and ecosystem richness. | ||
|
||
## Solution | ||
|
||
Ahrefs built their full-stack solution around **[OCaml](https://ocaml.org/)** compiled to JavaScript via **[Melange](https://melange.re/)**, paired with **[reason-react](https://github.com/reasonml/reason-react)** for the user interface. | ||
|
||
The cornerstone of their architecture is **[ATD (Adjustable Type Definitions)](https://github.com/ahrefs/atd)**. Ahrefs developed ATD to generate shared types for their frontend and backend -- initially in BuckleScript. | ||
|
||
Their frontend follows React patterns. Components are written in OCaml and compiled to JavaScript, with state management and data flow handled through React paradigms. Expressing all of this through OCaml ensures that data shapes match across the entire application. | ||
|
||
Integration with their existing data infrastructure (**[ClickHouse](https://clickhouse.com/)**, **[MySQL](https://www.mysql.com/)**, **[Elasticsearch](https://www.elastic.co/)** on **[AWS](https://aws.amazon.com/)**) is seamless with frontend and backend sharing the same type definitions. Rather than maintaining separate API contracts, the database serves as the source of truth and data shapes are automatically reflected throughout the application. | ||
|
||
## Lessons Learned | ||
|
||
* **Shared types eliminate entire bug categories**: Automatic synchronization between frontend and backend data structures prevents integration issues that commonly plague web applications. | ||
* **Team learning pays off**: Transitioning frontend developers to OCaml requires investment, but tools like the **[Melange playground](https://melange.re/v5.0.0/playground)** make onboarding approachable. Productivity gains compound over time. | ||
* **Writing bindings is manageable with good resources**: Interfacing with the existing JavaScript ecosystem initially seemed daunting, but the **[official Melange documentation](https://melange.re/)** and **[community resources](https://github.com/melange-community/bindings)** provide clear guidance for common patterns. | ||
* **Gradual migration reduces risk**: Starting with small components or isolated features allows teams to build confidence before transitioning entire applications. | ||
|
||
## Open Source | ||
|
||
Ahrefs contributes actively to the full-stack OCaml ecosystem, sharing tools that benefit the broader community: | ||
|
||
sabine marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- **[styled-ppx](https://github.com/davesnx/styled-ppx):** Type-safe styled components for ReScript, Melange and native with type-safe CSS | ||
Check failure on line 65 in data/success_stories/ahrefs-full-stack-web.md
|
||
- **[server-reason-react](https://github.com/ml-in-barcelona/server-reason-react):** Native implementation of React's Server-side rendering (SSR) and React Server Components (RSC) architecture for Reason | ||
Check failure on line 66 in data/success_stories/ahrefs-full-stack-web.md
|
||
- **[melange-recharts](https://github.com/ahrefs/melange-recharts):** Production-ready charting components for data visualization applications. | ||
Check failure on line 67 in data/success_stories/ahrefs-full-stack-web.md
|
||
- **[melange-json](https://github.com/melange-community/melange-json):** Streamlined JSON handling for frontend applications. | ||
Check failure on line 68 in data/success_stories/ahrefs-full-stack-web.md
|
||
- **[ocaml-mlx/mlx](https://github.com/ocaml-mlx/mlx):** OCaml `.mlx` syntax dialect which adds JSX syntax expressions | ||
Check failure on line 69 in data/success_stories/ahrefs-full-stack-web.md
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,56 @@ | ||
--- | ||
title: Petabyte-Scale Web Crawling and Data Processing | ||
logo: success-stories/ahrefs.svg | ||
card_logo: success-stories/white/ahrefs.svg | ||
background: /success-stories/ahrefs-bg.jpg | ||
theme: blue | ||
synopsis: "Ahrefs built the world's third-largest web crawler using OCaml, indexing petabytes of web data with a lean, efficient team." | ||
url: https://ahrefs.com/ | ||
priority: 2 | ||
why_ocaml_reasons: | ||
- Performance | ||
- Reliability | ||
- Expressiveness | ||
- Scalability | ||
- Maintainability | ||
--- | ||
|
||
## Challenge | ||
|
||
[Ahrefs](https://ahrefs.com/) is a Singapore-based SaaS company that provides comprehensive SEO tools and marketing intelligence powered by big data. Since 2011, they've been crawling the entire web daily to maintain extensive databases of backlinks, keywords, and website analytics that help businesses with SEO strategy, competitor analysis, and content optimization. Today, they're trusted by 44% of Fortune 500 companies. | ||
|
||
Building and operating a web crawler at internet scale presents extraordinary challenges. Ahrefs needs to index billions of web pages continuously, process petabytes of data in real-time, and turn this massive dataset into actionable insights for thousands of customers worldwide. The technical demands are staggering: their systems must handle **500 billion backend requests per day** while maintaining **over 100PB of storage**. | ||
|
||
As a self-funded company, Ahrefs couldn't solve these challenges by throwing unlimited resources at the problem. They needed maximum efficiency from a small team — systems that could run reliably for months without intervention, code that could be understood and maintained by a lean engineering organization, and performance that could compete with tech giants despite having a fraction of their headcount. | ||
|
||
The question wasn't just whether they could build a web-scale crawler, but whether they could do it sustainably with the constraints of a bootstrapped company. | ||
|
||
## Result | ||
|
||
Over a decade later, Ahrefs operates one of the world's most sophisticated web crawling operations. Their OCaml-powered systems maintains an index of **492.7 billion pages** across **500.4 million domains**. | ||
|
||
This technical achievement translates directly to business success. Ahrefs has grown into a **$100M+ ARR company** with **150 employees** managing **4000+ servers**—all while maintaining their original philosophy of operational efficiency. They've become the sector leader in SEO tools, proving that the right technology choices can create sustainable competitive advantages. | ||
|
||
The reliability of their OCaml systems is perhaps most impressive: programs written years ago continue running without surprises, requiring minimal maintenance from their engineering team. This "boring" reliability has allowed Ahrefs to focus engineering effort on building new features and capabilities rather than fighting infrastructure fires. | ||
|
||
Their success demonstrates that OCaml can power not just technical excellence at massive scale, but sustainable business growth in highly competitive markets. | ||
|
||
## Solution | ||
|
||
Ahrefs built their crawling infrastructure around OCaml's strengths, creating a distributed system that balances performance, reliability, and maintainability. **[OCaml](https://ocaml.org/)** serves as the primary language for all crawling and data processing systems, compiled natively for maximum performance across their **4000+ servers**. | ||
|
||
Their architecture treats data consistency as paramount. Defining shared data structures (using **[ATD (Adjustable Type Definitions)](https://github.com/ahrefs/atd)**, and now moving to [melange-json](https://github.com/melange-community/melange-json)), they ensure type safety throughout their processing pipeline — from initial web crawling to final data storage. This approach catches schema mismatches at compile time rather than at runtime, crucial when processing billions of pages daily. | ||
|
||
Their storage layer combines **[ClickHouse](https://clickhouse.com/)**, **[MySQL](https://www.mysql.com/)**, **[Elasticsearch](https://www.elastic.co/)**. The key insight was designing these systems to work together seamlessly through shared OCaml types rather than complex API layers. | ||
|
||
Ahrefs maintains their own libraries and frameworks rather than relying on generic solutions. This "build it ourselves" philosophy requires more initial investment but delivers systems perfectly tailored to web crawling demands. Their **1.5 million lines of OCaml code** represent years of accumulated domain expertise encoded in reliable, maintainable software. | ||
|
||
The result is a unified system where improvements to crawling algorithms, data processing pipelines, or storage efficiency can be implemented quickly and deployed confidently across their entire infrastructure. | ||
|
||
## Why OCaml | ||
|
||
* **Low maintenance burden**: OCaml systems built years ago continue running without intervention, allowing engineers to focus on new development rather than troubleshooting production issues. | ||
* **Static typing catches errors**: At petabyte scale, compile-time type checking prevents data format inconsistencies and runtime failures that would be expensive to debug in production environments processing large volumes of web data. | ||
* **Language expressiveness reduces development time**: OCaml's abstractions enabled building domain-specific systems efficiently rather than adapting existing frameworks. Small teams could develop complex crawling and data processing systems with relatively few lines of code. | ||
* **Performance**: Native compilation provides the throughput needed for processing billions of daily requests while maintaining code readability for long-term maintenance. | ||
* **Cost-effective specialized tooling**: OCaml made it practical to build custom systems tailored to specific requirements rather than using general-purpose solutions, which aligned with their business constraints of limited engineering resources. |
This file was deleted.
Uh oh!
There was an error while loading. Please reload this page.