This project scrapes bank routing codes from a series of web pages, processes the scraped data to produce a refined CSV file, and then embeds that data into a self-contained Go HTTP service binary. The HTTP service provides a JSON endpoint for querying bank details by routing code.
-
Crawler (Python): A Python script (
crawler.py) that:- Iterates over a series of URLs containing HTML tables of bank routing codes.
- Parses each page with BeautifulSoup.
- Aggregates all routing data into a
bank_routing_codes.csvfile.
-
Parser (Python): A second Python script (
parser.py) that:- Reads
bank_routing_codes.csv. - Extracts and separates the various code and name fields into distinct columns.
- Outputs a cleaner CSV file (e.g.
data.csv) suitable for embedding in the Go service.
- Reads
-
Go HTTP Service: A Go program that:
- Embeds
data.csvinto the binary using Go’sembedpackage. - Provides a
/routing/{routingNumber}HTTP endpoint. - Returns JSON data similar to the specified format, including the routing code as IFSC.
- Embeds
-
Python:
- Python 3.x
requests,beautifulsoup4
-
Go:
- Go 1.16+ (for
embed) github.com/gorilla/mux
- Go 1.16+ (for
-
Ensure all source files are in place:
crawler.pyandparser.pyin the project’s main directory.main.goanddata.csv(to be generated) inside therouting/directory.
-
Run the build script: The
build.shscript handles the entire workflow.chmod +x build.sh ./build.sh
What this does:
build.shruns the multi-stageDockerfilewhich- Creates and activates a Python virtual environment (venv).
- Installs Python dependencies (requests, beautifulsoup4).
- Runs crawler.py to generate bank_routing_codes.csv.
- Runs parser.py to process bank_routing_codes.csv into data.csv.
- Copies data.csv into the routing directory.
- Changes into the routing directory, initializes or updates the Go module, and builds the Go executable (routing-server).
-
Run the Go executable:
After build.sh completes, the routing-server binary will be found in the
outdirectory.cd out ./routing-server
The service will start on port :8080. Access it via:
curl http://localhost:8080/ROUTING_NUMBER
Sample JSON Response
When querying a valid routing number, the service returns JSON in the following format:
{
"BANK": "Some Bank Name",
"IFSC": "123456789",
"BRANCH": "Some Branch",
"ADDRESS": "N/A",
"CONTACT": "N/A",
"CITY": "N/A",
"DISTRICT": "Some District",
"STATE": "N/A",
"RTGS": false,
"BANKCODE": "123"
}IFSC is populated with the requested routing code, and other non-available fields are set to placeholders (N/A, false).
By default, the server starts on port 8080. You can specify a different port by setting the PORT environment variable:
PORT=9090 ./routing-server-
Adjusting URLs in crawler.py:
Update the base URL pattern or number of pages to scrape.
-
Adjusting CSV parsing in parser.py:
Change code and name extraction logic if the input CSV structure changes.
-
Modifying the Go JSON structure:
Update ResponseJSON in main.go or related logic to reflect your final desired output and data availability.
This project is open-source and available under [LICENSE] (if applicable).