From b3e23a3244438881468d04c4ef7d1064670f0469 Mon Sep 17 00:00:00 2001 From: henrikvalv3 Date: Thu, 25 Sep 2025 13:07:11 +0100 Subject: [PATCH 1/3] Nevada site selection notebook --- .../notebooks/nevada_site_selection/README.md | 35 + ...sights_nevada_site_selection_example.ipynb | 902 ++++++++++++++++++ 2 files changed, 937 insertions(+) create mode 100644 places_insights/notebooks/nevada_site_selection/README.md create mode 100644 places_insights/notebooks/nevada_site_selection/places_insights_nevada_site_selection_example.ipynb diff --git a/places_insights/notebooks/nevada_site_selection/README.md b/places_insights/notebooks/nevada_site_selection/README.md new file mode 100644 index 0000000..0025e09 --- /dev/null +++ b/places_insights/notebooks/nevada_site_selection/README.md @@ -0,0 +1,35 @@ +### **Site Selection in Las Vegas using Places Insights and BigQuery** + +**Overall Goal** + +This notebook demonstrates a multi-stage site selection workflow for a new coffee shop in Las Vegas. It combines broad competitive analysis, custom commercial suitability scoring, and target market density analysis to identify prime locations, then visualizes the results on a combined, interactive map. + +**Key Technologies Used** + +* **[Places Insights](https://developers.google.com/maps/documentation/placesinsights)**: To provide the core Places dataset and the `PLACES_COUNT_PER_H3` function. +* **[BigQuery](https://cloud.google.com/bigquery):**: To perform large-scale geospatial analysis and calculate suitability scores. +* **[Google Maps Place Details API](https://developers.google.com/maps/documentation/places/web-service/place-details):** To fetch rich, detailed information (name, address, rating) for specific ground-truth locations. +* **[Google Maps 2D Tiles](https://developers.google.com/maps/documentation/tile/2d-tiles-overview):** To use Google Maps as the interactive basemap. +* **Python Libraries:** + * **[GeoPandas](https://geopandas.org/en/stable/)** for spatial data manipulation. + * **[Folium](https://python-visualization.github.io/folium/latest/)** for creating the final interactive, layered map. + +See [Google Maps Platform Pricing](https://mapsplatform.google.com/intl/en_uk/pricing/) For API costs assocated with running this notebook. + +**The Step-by-Step Workflow** + +1. **Analyze Competitor Density:** We begin by using BigQuery to analyze the distribution of major competitor brands across Clark County ZIP codes. This initial step helps identify broad areas with lower market saturation. + +2. **Identify Prime Commercial Zones:** The notebook then runs a more sophisticated query to calculate a custom suitability score for H3 hexagonal cells. This score is based on the weighted density of complementary businesses (restaurants, bars, casinos, tourist attractions), pinpointing the most commercially vibrant areas. + +3. **Find Target Market Hotspots & Synthesize:** Next, we use the `PLACES_COUNT_PER_H3` function to find the density of our target business type—coffee shops. The notebook then **automatically** cross-references these coffee shop counts with the highest-scoring suitability zones to identify the most promising cells for a new location. + +4. **Create a Combined Visualization:** In the final step, we generate a single, layered map. The **base layer** is a choropleth "heatmap" showing the suitability scores across Las Vegas. The **top layer** displays individual pins for existing coffee shops in the top-ranked zones, providing a direct, ground-level view of the current market landscape. + +**How to Use This Notebook** + +1. **\*\*Set Up Secrets:\*\*** Before you begin, you must configure two secrets in the Colab “Secrets” tab (the 🔑 key icon on the left menu): + * `GCP_PROJECT`: Your Google Cloud Project ID with access to Places Insights. + * `GMP_API_KEY`: Your Google Maps Platform API key. Ensure the **Maps Tile API** and **Places API (new)** are enabled for this key in your GCP console. + +2. **Run the Cells:** Once the secrets are set, simply run the cells in order from top to bottom. Each visualization will appear as the output of its corresponding code cell. \ No newline at end of file diff --git a/places_insights/notebooks/nevada_site_selection/places_insights_nevada_site_selection_example.ipynb b/places_insights/notebooks/nevada_site_selection/places_insights_nevada_site_selection_example.ipynb new file mode 100644 index 0000000..e0fb49f --- /dev/null +++ b/places_insights/notebooks/nevada_site_selection/places_insights_nevada_site_selection_example.ipynb @@ -0,0 +1,902 @@ +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "provenance": [] + }, + "kernelspec": { + "name": "python3", + "display_name": "Python 3" + }, + "language_info": { + "name": "python" + } + }, + "cells": [ + { + "cell_type": "markdown", + "source": [ + "### **Site Selection in Las Vegas using Places Insights and BigQuery**\n", + "\n", + "**Overall Goal**\n", + "\n", + "This notebook demonstrates a multi-stage site selection workflow for a new coffee shop in Las Vegas. It combines broad competitive analysis, custom commercial suitability scoring, and target market density analysis to identify prime locations, then visualizes the results on a combined, interactive map.\n", + "\n", + "**Key Technologies Used**\n", + "\n", + "* **[Places Insights](https://developers.google.com/maps/documentation/placesinsights)**: To provide the core Places dataset and the `PLACES_COUNT_PER_H3` function.\n", + "* **[BigQuery](https://cloud.google.com/bigquery):**: To perform large-scale geospatial analysis and calculate suitability scores.\n", + "* **[Google Maps Place Details API](https://developers.google.com/maps/documentation/places/web-service/place-details):** To fetch rich, detailed information (name, address, rating) for specific ground-truth locations.\n", + "* **[Google Maps 2D Tiles](https://developers.google.com/maps/documentation/tile/2d-tiles-overview):** To use Google Maps as the interactive basemap.\n", + "* **Python Libraries:**\n", + " * **[GeoPandas](https://geopandas.org/en/stable/)** for spatial data manipulation.\n", + " * **[Folium](https://python-visualization.github.io/folium/latest/)** for creating the final interactive, layered map.\n", + "\n", + "See [Google Maps Platform Pricing](https://mapsplatform.google.com/intl/en_uk/pricing/) For API costs assocated with running this notebook.\n", + "\n", + "**The Step-by-Step Workflow**\n", + "\n", + "1. **Analyze Competitor Density:** We begin by using BigQuery to analyze the distribution of major competitor brands across Clark County ZIP codes. This initial step helps identify broad areas with lower market saturation.\n", + "\n", + "2. **Identify Prime Commercial Zones:** The notebook then runs a more sophisticated query to calculate a custom suitability score for H3 hexagonal cells. This score is based on the weighted density of complementary businesses (restaurants, bars, casinos, tourist attractions), pinpointing the most commercially vibrant areas.\n", + "\n", + "3. **Find Target Market Hotspots & Synthesize:** Next, we use the `PLACES_COUNT_PER_H3` function to find the density of our target business type—coffee shops. The notebook then **automatically** cross-references these coffee shop counts with the highest-scoring suitability zones to identify the most promising cells for a new location.\n", + "\n", + "4. **Create a Combined Visualization:** In the final step, we generate a single, layered map. The **base layer** is a choropleth \"heatmap\" showing the suitability scores across Las Vegas. The **top layer** displays individual pins for existing coffee shops in the top-ranked zones, providing a direct, ground-level view of the current market landscape.\n", + "\n", + "**How to Use This Notebook**\n", + "\n", + "1. **\\*\\*Set Up Secrets:\\*\\*** Before you begin, you must configure two secrets in the Colab “Secrets” tab (the 🔑 key icon on the left menu):\n", + " * `GCP_PROJECT`: Your Google Cloud Project ID with access to Places Insights.\n", + " * `GMP_API_KEY`: Your Google Maps Platform API key. Ensure the **Maps Tile API** and **Places API (new)** are enabled for this key in your GCP console.\n", + "\n", + "2. **Run the Cells:** Once the secrets are set, simply run the cells in order from top to bottom. Each visualization will appear as the output of its corresponding code cell." + ], + "metadata": { + "id": "RlVp9FqDtHN_" + } + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "p86TlWffChXY" + }, + "outputs": [], + "source": [ + "# Install necessary libraries\n", + "!pip install google-cloud-bigquery geopandas shapely folium mapclassify xyzservices google-maps-places googlemaps" + ] + }, + { + "cell_type": "code", + "source": [ + "# Import libraries\n", + "from google.cloud import bigquery\n", + "from google.colab import auth, userdata, data_table\n", + "from google.api_core import exceptions\n", + "\n", + "from google.maps import places_v1\n", + "\n", + "import requests\n", + "\n", + "import geopandas as gpd\n", + "import shapely\n", + "import sys\n", + "\n", + "import pandas as pd\n", + "\n", + "# Import the mapping libraries\n", + "import folium\n", + "import mapclassify # Used by .explore() for data classification\n", + "import xyzservices # Provides tile layers" + ], + "metadata": { + "id": "C8eoNW5pCiit" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "# Configure GCP Authentication\n", + "# This part securely gets your GCP Project ID.\n", + "GCP_PROJECT_SECRET_KEY_NAME = \"GCP_PROJECT\" #@param {type:\"string\"}\n", + "GCP_PROJECT_ID = None\n", + "\n", + "if \"google.colab\" in sys.modules:\n", + " try:\n", + " GCP_PROJECT_ID = userdata.get(GCP_PROJECT_SECRET_KEY_NAME)\n", + " if GCP_PROJECT_ID:\n", + " print(f\"Authenticating to GCP project: {GCP_PROJECT_ID}\")\n", + " auth.authenticate_user(project_id=GCP_PROJECT_ID)\n", + " else:\n", + " raise ValueError(f\"Could not retrieve GCP Project ID from secret named '{GCP_PROJECT_SECRET_KEY_NAME}'. \"\n", + " \"Please make sure the secret is set in your Colab environment.\")\n", + " except userdata.SecretNotFoundError:\n", + " raise ValueError(f\"Secret named '{GCP_PROJECT_SECRET_KEY_NAME}' not found. \"\n", + " \"Please create it in the 'Secrets' tab (key icon) in Colab.\")" + ], + "metadata": { + "id": "t71EFDfMClnx" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "API_KEY_SECRET_NAME = \"GMP_API_KEY\" #@param {type:\"string\"}\n", + "\n", + "# Initialize a variable to hold our key.\n", + "gmp_api_key = None\n", + "\n", + "try:\n", + " # Attempt to retrieve the secret value using its name.\n", + " gmp_api_key = userdata.get(API_KEY_SECRET_NAME)\n", + " print(\"Successfully retrieved API key.\")\n", + "\n", + "except userdata.SecretNotFoundError:\n", + " raise ValueError(f\"Secret named '{API_KEY_SECRET_NAME}' not found. \"\n", + " \"Please create it in the 'Secrets' tab (key icon) in Colab.\")" + ], + "metadata": { + "id": "84L3L1-2CmD1" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "# Enable interactive tables for pandas DataFrames\n", + "data_table.enable_dataframe_formatter()\n", + "client = bigquery.Client(project=GCP_PROJECT_ID)" + ], + "metadata": { + "id": "yfzhet80Cqt7" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "### **Visualizing Competitor Density on a Heatmap**\n", + "\n", + "The code below performs a high-level competitive analysis by querying for specific competitor brands across Clark County. It then generates a heatmap to visualize areas of market saturation. Here's what it shows:\n", + "\n", + "* **Weighted Heatmap:** The map displays heat based on the concentration of major competitor brands (`7-Eleven`, `CVS`, `Walgreens`, etc.). The intensity of the \"heat\" is centered on the centroid of each ZIP code.\n", + "* **Color Scale:** The map uses a gradient from blue (low density) to red (high density), making it easy to spot competitive hotspots at a glance.\n", + "* **Geographic Scope:** The analysis is focused on Clark County, Nevada, providing a broad overview of the Las Vegas metropolitan area.\n", + "\n", + "This visualization gives an immediate, high-level understanding of which ZIP codes are already heavily saturated with key competitors.\n", + "\n", + "**Note:** Upcoming cells uses 2D Map Tiles. Please review the documentation for pricing." + ], + "metadata": { + "id": "mVrxk9ANug70" + } + }, + { + "cell_type": "code", + "source": [ + "# Define the BigQuery SQL query as a multi-line string\n", + "# This query performs a competitive analysis by counting specific brands\n", + "# ('7-Eleven', 'CVS', 'Walgreens', etc.) within each postal code in Clark County, NV.\n", + "# It then joins these counts with a public dataset to get the geographic shape (polygon) of each postal code.\n", + "brand_competitive_analysis_query = \"\"\"\n", + "WITH brand_counts_by_zip AS (\n", + " SELECT WITH AGGREGATION_THRESHOLD\n", + " postal_code,\n", + " COUNT(*) AS total_brand_count\n", + " FROM\n", + " `places_insights___us.places` AS places_table,\n", + " UNNEST(places_table.postal_code_names) AS postal_code,\n", + " UNNEST(places_table.brand_ids) AS brand_id\n", + " JOIN\n", + " (\n", + " SELECT\n", + " id,\n", + " name\n", + " FROM\n", + " `places_insights___us.brands`\n", + " WHERE\n", + " name IN ('7-Eleven', 'CVS', 'Walgreens', 'Subway Restaurants', \"McDonald's\")\n", + " ) AS brand_names\n", + " ON brand_names.id = brand_id\n", + " WHERE\n", + " places_table.administrative_area_level_2_name = 'Clark County'\n", + " AND places_table.administrative_area_level_1_name = 'Nevada'\n", + " GROUP BY\n", + " postal_code\n", + ")\n", + "-- Now, join the aggregated results to the boundaries table to get the shapes\n", + "SELECT\n", + " counts.postal_code,\n", + " counts.total_brand_count,\n", + " -- Best practice: Simplify the geometry for faster rendering in maps\n", + " ST_SIMPLIFY(zip_boundaries.zip_code_geom, 100) AS geography\n", + "FROM\n", + " brand_counts_by_zip AS counts\n", + "JOIN\n", + " `bigquery-public-data.geo_us_boundaries.zip_codes` AS zip_boundaries\n", + " ON counts.postal_code = zip_boundaries.zip_code\n", + "ORDER BY\n", + " counts.total_brand_count DESC\n", + "\"\"\"\n", + "\n" + ], + "metadata": { + "id": "hivC3r8KP9Pa" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "# Execute the query and load the results into a GeoDataFrame.\n", + "\n", + "try:\n", + " print(\"Executing query to find competitor density by ZIP code...\")\n", + " gdf_zip_counts = client.query(brand_competitive_analysis_query).to_geodataframe()\n", + "\n", + " # Set the Coordinate Reference System (CRS) for the GeoDataFrame.\n", + " # WGS84 (EPSG:4326) is the standard for latitude/longitude data.\n", + " gdf_zip_counts.crs = \"EPSG:4326\"\n", + "\n", + " print(\"\\nQuery successful. Displaying top 5 ZIP codes by competitor count:\")\n", + " # Display the first 5 rows of the resulting GeoDataFrame\n", + " display(gdf_zip_counts.head(5))\n", + "\n", + "except exceptions.NotFound as e:\n", + " print(f\"\\nERROR: A table was not found. Please ensure you have subscribed to the \"\n", + " f\"'places_insights___us.places' and 'places_insights___us.brands' tables in Analytics Hub.\")\n", + " print(f\"Details: {e}\")\n", + "except Exception as e:\n", + " print(f\"\\nAn unexpected error occurred: {e}\")" + ], + "metadata": { + "id": "mMskvCe6RkbL" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "# Import the HeatMap plugin from folium\n", + "from folium.plugins import HeatMap\n", + "\n", + "# Verify the GMP API key exists.\n", + "if 'gmp_api_key' not in locals() or gmp_api_key is None:\n", + " raise NameError(\"The 'gmp_api_key' variable is not defined or is None. \"\n", + " \"Please run the API key retrieval cell first.\")\n", + "\n", + "# Request a session token and attribution for Google Maps tiles.\n", + "print(\"Requesting Google Maps session token...\")\n", + "session_url = f\"https://tile.googleapis.com/v1/createSession?key={gmp_api_key}\"\n", + "payload = {\"mapType\": \"roadmap\", \"language\": \"en-US\", \"region\": \"US\"}\n", + "headers = {\"Content-Type\": \"application/json\"}\n", + "\n", + "try:\n", + " response_session = requests.post(session_url, json=payload, headers=headers)\n", + " response_session.raise_for_status() # Raise an error for bad responses\n", + " session_data = response_session.json()\n", + " session_token = session_data['session']\n", + " print(\"Session token acquired successfully.\")\n", + "\n", + " # Fetch dynamic attribution required by Google Maps.\n", + " print(\"Fetching dynamic attribution...\")\n", + " bounds = gdf_zip_counts.total_bounds\n", + " viewport_url = (\n", + " f\"https://tile.googleapis.com/tile/v1/viewport?key={gmp_api_key}\"\n", + " f\"&session={session_token}\"\n", + " f\"&zoom=10\"\n", + " f\"&north={bounds[3]}&south={bounds[1]}\"\n", + " f\"&west={bounds[0]}&east={bounds[2]}\"\n", + " )\n", + " response_viewport = requests.get(viewport_url)\n", + " response_viewport.raise_for_status()\n", + " viewport_data = response_viewport.json()\n", + " google_attribution = viewport_data.get('copyright', '© Google')\n", + " print(\"Attribution received.\")\n", + "\n", + "except requests.exceptions.RequestException as e:\n", + " raise RuntimeError(f\"Failed to set up Google Maps tiles. Please check your API key and permissions. Details: {e}\")\n", + "\n", + "# Construct the Tile URL that Folium will use for the basemap.\n", + "google_tiles_url = f\"https://tile.googleapis.com/v1/2dtiles/{{z}}/{{x}}/{{y}}?session={session_token}&key={gmp_api_key}\"\n", + "\n", + "# Ensure the geometry column is correctly named 'geometry' for GeoPandas operations.\n", + "if 'geography' in gdf_zip_counts.columns and 'geometry' not in gdf_zip_counts.columns:\n", + " gdf_zip_counts = gdf_zip_counts.rename(columns={'geography': 'geometry'})\n", + " # Also, explicitly set it as the active geometry column.\n", + " gdf_zip_counts = gdf_zip_counts.set_geometry('geometry')\n", + "\n", + "# Calculate the center of the data to focus the map.\n", + "center_lat = (bounds[1] + bounds[3]) / 2\n", + "center_lon = (bounds[0] + bounds[2]) / 2\n", + "map_center = [center_lat, center_lon]\n", + "\n", + "# Initialize the base map.\n", + "print(\"Initializing base map...\")\n", + "competitor_heatmap = folium.Map(\n", + " location=map_center,\n", + " zoom_start=10,\n", + " tiles=google_tiles_url,\n", + " attr=google_attribution\n", + ")\n", + "\n", + "# Prepare data for the heatmap: a list of [lat, lon, weight] points.\n", + "print(\"Preparing weighted locations from ZIP code centroids...\")\n", + "weighted_locations = []\n", + "for index, row in gdf_zip_counts.iterrows():\n", + " # Get the center point (centroid) of each ZIP code polygon.\n", + " # This line will now work correctly.\n", + " centroid = row['geometry'].centroid\n", + " # Append the centroid's lat/lon and the brand count as its weight.\n", + " weighted_locations.append([centroid.y, centroid.x, row['total_brand_count']])\n", + "\n", + "# Create and add the heatmap layer to the base map.\n", + "print(\"Generating and adding heatmap layer...\")\n", + "heatmap_layer = HeatMap(\n", + " data=weighted_locations,\n", + " radius=25, # Adjust the influence radius of each data point\n", + " blur=15 # Adjust the smoothness of the color gradient\n", + ")\n", + "heatmap_layer.add_to(competitor_heatmap)\n", + "\n", + "# Display the final map.\n", + "print(\"Displaying map...\")\n", + "display(competitor_heatmap)" + ], + "metadata": { + "id": "D0UmOCrYToXc" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "### **Mapping Commercial Suitability with a Choropleth Map**\n", + "\n", + "The code below executes a sophisticated query to calculate a custom `suitability_score` for hexagonal H3 cells across Las Vegas. Here's what the resulting map shows:\n", + "\n", + "* **Choropleth Map:** Each H3 hexagon on the map is colored based on its normalized suitability score, which is calculated from the weighted density of nearby complementary businesses (restaurants, bars, casinos, etc.).\n", + "* **Color Scale:** The map uses a light-to-dark sequential purple scale (`Purples`). Lighter purple areas have lower suitability scores, while the darkest purple areas represent the most commercially vibrant hotspots.\n", + "* **Interactivity:** You can hover over any hexagon to see its unique H3 index, its suitability score, and the specific counts of businesses that contributed to that score. Clicking on a hexagon will open a popup with all the data for that cell.\n", + "\n", + "This visualization moves beyond simple competitor analysis to pinpoint the most promising micro-locations for a new business based on commercial synergy.\n", + "\n", + "**Note:** Upcoming cells uses 2D Map Tiles. Please review the documentation for pricing." + ], + "metadata": { + "id": "0fdii2b_uwaF" + } + }, + { + "cell_type": "code", + "source": [ + "# Define the BigQuery SQL query to calculate a custom suitability score.\n", + "# This query identifies prime commercial zones in Las Vegas by scoring H3 cells\n", + "# based on a weighted count of nearby businesses open on Mondays between 10 AM and 2 PM.\n", + "suitability_score_query = \"\"\"\n", + "WITH PlacesInTargetAreaWithOpenFlag AS (\n", + " SELECT\n", + " point,\n", + " types,\n", + " EXISTS(\n", + " SELECT 1\n", + " FROM UNNEST(regular_opening_hours.monday) AS monday_hours\n", + " WHERE\n", + " monday_hours.start_time <= TIME '10:00:00'\n", + " AND monday_hours.end_time >= TIME '14:00:00'\n", + " ) AS is_open_monday_window\n", + " FROM\n", + " `places_insights___us.places`\n", + " WHERE\n", + " EXISTS (\n", + " SELECT 1 FROM UNNEST(locality_names) AS locality\n", + " WHERE locality IN ('Las Vegas', 'Spring Valley', 'Paradise', 'North Las Vegas', 'Winchester')\n", + " )\n", + " AND administrative_area_level_1_name = 'Nevada'\n", + "),\n", + "TileScores AS (\n", + " SELECT WITH AGGREGATION_THRESHOLD\n", + " `carto-os.carto.H3_FROMGEOGPOINT`(point, 8) AS h3_index,\n", + " (\n", + " COUNTIF('restaurant' IN UNNEST(types) AND is_open_monday_window) * 8 +\n", + " COUNTIF('convenience_store' IN UNNEST(types) AND is_open_monday_window) * 3 +\n", + " COUNTIF('bar' IN UNNEST(types) AND is_open_monday_window) * 7 +\n", + " COUNTIF('tourist_attraction' IN UNNEST(types) AND is_open_monday_window) * 6 +\n", + " COUNTIF('casino' IN UNNEST(types) AND is_open_monday_window) * 7\n", + " ) AS suitability_score,\n", + " COUNTIF('restaurant' IN UNNEST(types) AND is_open_monday_window) AS restaurant_count,\n", + " COUNTIF('convenience_store' IN UNNEST(types) AND is_open_monday_window) AS convenience_store_count,\n", + " COUNTIF('bar' IN UNNEST(types) AND is_open_monday_window) AS bar_count,\n", + " COUNTIF('tourist_attraction' IN UNNEST(types) AND is_open_monday_window) AS tourist_attraction_count,\n", + " COUNTIF('casino' IN UNNEST(types) AND is_open_monday_window) AS casino_count\n", + " FROM\n", + " PlacesInTargetAreaWithOpenFlag\n", + " GROUP BY\n", + " h3_index\n", + "),\n", + "MaxScore AS (\n", + " SELECT MAX(suitability_score) AS max_score FROM TileScores\n", + ")\n", + "SELECT\n", + " ts.h3_index,\n", + " `carto-os.carto.H3_BOUNDARY`(ts.h3_index) AS h3_geography,\n", + " ts.restaurant_count,\n", + " ts.convenience_store_count,\n", + " ts.bar_count,\n", + " ts.tourist_attraction_count,\n", + " ts.casino_count,\n", + " ts.suitability_score,\n", + " ROUND(\n", + " CASE\n", + " WHEN ms.max_score = 0 THEN 0\n", + " ELSE (ts.suitability_score / ms.max_score) * 10\n", + " END,\n", + " 2\n", + " ) AS normalized_suitability_score\n", + "FROM\n", + " TileScores ts, MaxScore ms\n", + "ORDER BY\n", + " normalized_suitability_score DESC;\n", + "\"\"\"" + ], + "metadata": { + "id": "CLzL_v0rb8T_" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "# Execute the query and load the results into a GeoDataFrame.\n", + "try:\n", + " print(\"Executing query to find most suitable locations in Las Vegas...\")\n", + " gdf_suitability = client.query(suitability_score_query).to_geodataframe()\n", + "\n", + " # Proactively rename the 'h3_geography' column to 'geometry' to adhere to the\n", + " # GeoPandas standard, preventing potential errors in later mapping steps.\n", + " if 'h3_geography' in gdf_suitability.columns:\n", + " gdf_suitability = gdf_suitability.rename(columns={'h3_geography': 'geometry'})\n", + " gdf_suitability = gdf_suitability.set_geometry('geometry')\n", + "\n", + "\n", + " # Set the Coordinate Reference System (CRS) for the GeoDataFrame.\n", + " gdf_suitability.crs = \"EPSG:4326\"\n", + "\n", + " print(\"\\nQuery successful. Displaying top 5 most suitable H3 cells:\")\n", + " # Display the first 5 rows of the resulting GeoDataFrame, which represent the\n", + " # highest-scoring locations based on our custom model.\n", + " display(gdf_suitability.head(5))\n", + "\n", + "except exceptions.NotFound as e:\n", + " print(f\"\\nERROR: A table was not found. Please ensure you have subscribed to the \"\n", + " f\"'places_insights___us.places' table in Analytics Hub.\")\n", + " print(f\"Details: {e}\")\n", + "except Exception as e:\n", + " print(f\"\\nAn unexpected error occurred: {e}\")" + ], + "metadata": { + "id": "FGRMe9BPb_Pk" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "try:\n", + " # This code block assumes 'google_tiles_url' and 'google_attribution'\n", + " # were successfully created and are available from a previous cell.\n", + " _ = google_tiles_url\n", + " _ = google_attribution\n", + "except NameError:\n", + " raise NameError(\"The required variables 'google_tiles_url' or 'google_attribution' were not found. \"\n", + " \"Please run the previous map-generating cell to create them before running this one.\")\n", + "\n", + "\n", + "# Define the columns to show in the map's tooltip on hover.\n", + "suitability_tooltip_cols = [\n", + " 'h3_index',\n", + " 'normalized_suitability_score',\n", + " 'suitability_score',\n", + " 'restaurant_count',\n", + " 'casino_count',\n", + " 'bar_count',\n", + " 'tourist_attraction_count',\n", + " 'convenience_store_count'\n", + "]\n", + "\n", + "# Create the choropleth map using the .explore() function.\n", + "print(\"Generating choropleth map of suitability scores using a purple color scheme...\")\n", + "suitability_map = gdf_suitability.explore(\n", + " column=\"normalized_suitability_score\",\n", + " # *** CHANGE APPLIED HERE ***\n", + " # The 'Purples' colormap provides a sequential light-to-dark purple scale.\n", + " cmap=\"Purples\",\n", + " scheme=\"NaturalBreaks\",\n", + " k=7,\n", + " tooltip=suitability_tooltip_cols,\n", + " popup=True,\n", + " tiles=google_tiles_url, # Reusing the variable from the previous cell\n", + " attr=google_attribution, # Reusing the variable from the previous cell\n", + " style_kwds={\"stroke\": True, \"color\": \"black\", \"weight\": 0.2, \"fillOpacity\": 0.65}\n", + ")\n", + "\n", + "# Display the final map.\n", + "print(\"Displaying map...\")\n", + "display(suitability_map)" + ], + "metadata": { + "id": "N3tSNtOicrB5" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "### Calculating Coffee Shop Density using a Specialized Function**\n", + "\n", + "The code below uses the `PLACES_COUNT_PER_H3` function in Place Insights to efficiently count existing coffee shops within the Las Vegas metro area. Here's what the query does:\n", + "\n", + "* **Geofenced Query:** The analysis is automatically constrained to a predefined Las Vegas metro area, created by merging the boundaries of five key localities.\n", + "* **H3 Aggregation:** The function returns precise coffee shop counts neatly aggregated into H3 resolution 8 cells.\n", + "* **Data Output:** The result is a GeoDataFrame containing the H3 index, the total coffee shop count for that cell, and a list of `sample_place_ids` for spot-checking.\n", + "\n", + "This step provides the crucial, quantitative data on the density of our target business type in preparation for the final analysis." + ], + "metadata": { + "id": "ufgmq6HQu9zP" + } + }, + { + "cell_type": "code", + "source": [ + "# Define the BigQuery SQL query to count coffee shops in the Las Vegas metro area.\n", + "# This query first dynamically creates a single polygon representing the metro area\n", + "# by merging several localities from a public map dataset. It then calls the\n", + "# specialized PLACES_COUNT_PER_H3 function to efficiently get the counts.\n", + "coffee_shop_density_query = \"\"\"\n", + "-- Define a variable to hold the combined geography for the Las Vegas metro area.\n", + "DECLARE las_vegas_metro_area GEOGRAPHY;\n", + "\n", + "-- Set the variable by fetching the shapes for the five localities from Overture Maps\n", + "-- and merging them into a single polygon using ST_UNION_AGG.\n", + "SET las_vegas_metro_area = (\n", + " SELECT\n", + " ST_UNION_AGG(geometry)\n", + " FROM\n", + " `bigquery-public-data.overture_maps.division_area`\n", + " WHERE\n", + " country = 'US'\n", + " AND region = 'US-NV'\n", + " AND names.primary IN ('Las Vegas', 'Spring Valley', 'Paradise', 'North Las Vegas', 'Winchester')\n", + ");\n", + "\n", + "-- Call the PLACES_COUNT_PER_H3 function with our defined area and parameters.\n", + "SELECT\n", + " *\n", + "FROM\n", + " `places_insights___us.PLACES_COUNT_PER_H3`(\n", + " JSON_OBJECT(\n", + " 'geography', las_vegas_metro_area,\n", + " 'types', [\"coffee_shop\"],\n", + " 'business_status', ['OPERATIONAL'],\n", + " 'h3_resolution', 8\n", + " )\n", + " );\n", + "\"\"\"" + ], + "metadata": { + "id": "uBD7Pmw5zH0K" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "# Execute the query and load the results into a GeoDataFrame.\n", + "try:\n", + " print(\"Executing query to count coffee shops per H3 cell in the Las Vegas metro area...\")\n", + " gdf_coffee_shops = client.query(coffee_shop_density_query).to_geodataframe()\n", + "\n", + " # The function returns a geometry column named 'h3_geography'. We rename it to the\n", + " # GeoPandas standard 'geometry' for maximum compatibility.\n", + " if 'h3_geography' in gdf_coffee_shops.columns:\n", + " gdf_coffee_shops = gdf_coffee_shops.rename(columns={'h3_geography': 'geometry'})\n", + " gdf_coffee_shops = gdf_coffee_shops.set_geometry('geometry')\n", + "\n", + " # Set the Coordinate Reference System (CRS).\n", + " gdf_coffee_shops.crs = \"EPSG:4326\"\n", + "\n", + " print(\"\\nQuery successful. Displaying the top 5 H3 cells by coffee shop count:\")\n", + "\n", + " # Sort by the 'count' column.\n", + " display(gdf_coffee_shops.sort_values('count', ascending=False).head(5))\n", + "\n", + "except exceptions.NotFound as e:\n", + " print(f\"\\nERROR: A table was not found. Please ensure you have access to both \"\n", + " f\"'places_insights___us.places' and the public 'overture_maps' dataset.\")\n", + " print(f\"Details: {e}\")\n", + "except Exception as e:\n", + " print(f\"\\nAn unexpected error occurred: {e}\")\n", + " print(f\"\\nAn unexpected error occurred: {e}\")" + ], + "metadata": { + "id": "dIoc1W-szMCz" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "# This cell performs a comparative analysis. It identifies the top 5 zones from your\n", + "# custom suitability model and then cross-references them with the coffee shop\n", + "# density data to see how they compare.\n", + "\n", + "try:\n", + " # --- Step 1: Isolate the Top 5 Most Suitable H3 Cells ---\n", + " # The 'gdf_suitability' DataFrame is already sorted by score, so we just\n", + " # need to take the first 5 rows.\n", + " print(\"Identifying the top 5 H3 cells from the suitability score analysis...\")\n", + " top_suitability_cells = gdf_suitability.head(5)\n", + "\n", + " # Extract the 'h3_index' values from these top 5 cells into a list.\n", + " top_h3_indexes = top_suitability_cells['h3_index'].tolist()\n", + " print(f\"The top 5 H3 indexes are: {top_h3_indexes}\")\n", + "\n", + " # Now, we find the rows in our 'gdf_coffee_shops' DataFrame where the\n", + " # 'h3_cell_index' matches one of the indexes from our top 5 list.\n", + "\n", + " # The .isin() method is perfect for filtering a DataFrame based on a list of values.\n", + " # Note: The column name in this DataFrame is 'h3_cell_index'.\n", + " coffee_counts_in_top_zones = gdf_coffee_shops[\n", + " gdf_coffee_shops['h3_cell_index'].isin(top_h3_indexes)\n", + " ]\n", + "\n", + " # --- Step 3: Display the Final Comparison Table ---\n", + " print(\"\\n--- Results ---\")\n", + " print(\"Coffee shop counts within the top 5 most suitable zones:\")\n", + "\n", + " if coffee_counts_in_top_zones.empty:\n", + " print(\"No coffee shops were found in the top 5 most suitable H3 cells.\")\n", + " else:\n", + " # Display the resulting table, sorted by coffee shop count for clarity.\n", + " display(coffee_counts_in_top_zones.sort_values('count', ascending=False))\n", + "\n", + "except NameError as e:\n", + " print(f\"\\nERROR: A required DataFrame was not found. Please ensure the cells that create \"\n", + " f\"'gdf_suitability' and 'gdf_coffee_shops' have been run successfully.\")\n", + " print(f\"Details: {e}\")\n", + "except KeyError as e:\n", + " print(f\"\\nERROR: A required column was not found. Please check the column names in your DataFrames.\")\n", + " print(f\"Details: {e}\")" + ], + "metadata": { + "id": "SC1HmphC0v8B" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "### **Fetching Ground-Truth Details with the Places API**\n", + "\n", + "The code below uses the `sample_place_ids` from our highest-potential zones to fetch detailed, ground-truth information for each coffee shop using the Places API. Here's what it does:\n", + "\n", + "* **Places API Call:** The code programmatically calls the Places API for each unique Place ID, requesting specific fields to optimize for cost and speed.\n", + "* **Rich Information:** It retrieves key details like the business's official name, formatted address, user rating, website, and precise latitude/longitude coordinates.\n", + "* **Structured Output:** The results are compiled into a clean and readable Pandas DataFrame.\n", + "\n", + "This step transforms the aggregated counts from the previous analysis into a concrete list of real-world competitor locations.\n", + "\n", + "**Note:** This cell uses the Places API. Please review the documentation for pricing." + ], + "metadata": { + "id": "wTVdVUy-vK1o" + } + }, + { + "cell_type": "code", + "source": [ + "# Import the necessary libraries for the Places API client and for async operations\n", + "from google.maps import places_v1\n", + "import asyncio\n", + "import pandas as pd\n", + "\n", + "# Verify that the required DataFrame from the previous step exists.\n", + "if 'coffee_counts_in_top_zones' not in locals():\n", + " raise NameError(\"The 'coffee_counts_in_top_zones' DataFrame was not found. \"\n", + " \"Please run the previous cell to create it before proceeding.\")\n", + "\n", + "# --- Step 1: Extract and Unify All Place IDs ---\n", + "# The 'sample_place_ids' column contains lists of IDs. We need to flatten this\n", + "# into a single list and get only the unique IDs to avoid duplicate API calls.\n", + "print(\"Extracting unique Place IDs from the data...\")\n", + "all_place_ids = [\n", + " place_id\n", + " for id_list in coffee_counts_in_top_zones['sample_place_ids']\n", + " for place_id in id_list\n", + "]\n", + "unique_place_ids = list(set(all_place_ids))\n", + "print(f\"Found {len(unique_place_ids)} unique places to look up.\")\n", + "\n", + "# --- Step 2: Initialize the Places API Client ---\n", + "# We use the GMP API key that was securely stored in Colab's userdata.\n", + "print(\"Initializing Places API client...\")\n", + "try:\n", + " places_client = places_v1.PlacesAsyncClient(\n", + " client_options={\"api_key\": gmp_api_key}\n", + " )\n", + "except NameError:\n", + " raise NameError(\"The 'gmp_api_key' variable is not defined. Please run the API key cell first.\")\n", + "\n", + "# --- Step 3: Define an Asynchronous Function to Fetch Details ---\n", + "# Using an async function allows us to make API calls concurrently in the future if needed.\n", + "async def get_details_for_places(client, place_ids):\n", + " \"\"\"Fetches details for a list of Place IDs using the Places API.\"\"\"\n", + " place_details_list = []\n", + " print(f\"Fetching details for {len(place_ids)} places...\")\n", + "\n", + " for place_id in place_ids:\n", + " # The 'name' parameter for the API must be in the format 'places/PLACE_ID'.\n", + " request = places_v1.GetPlaceRequest(\n", + " name=f\"places/{place_id}\",\n", + " )\n", + "\n", + " # A FieldMask specifies which fields we want in the response.\n", + " # This is a best practice that reduces latency and cost.\n", + " # *** CHANGE APPLIED HERE: Added 'location' to the field mask ***\n", + " field_mask = \"displayName,formattedAddress,rating,websiteUri,location\"\n", + "\n", + " try:\n", + " # Make the asynchronous API call.\n", + " response = await client.get_place(request=request, metadata=[('x-goog-fieldmask', field_mask)])\n", + "\n", + " # Store the results in a dictionary.\n", + " # *** CHANGE APPLIED HERE: Added latitude and longitude from the response ***\n", + " details = {\n", + " \"place_id\": place_id,\n", + " \"name\": response.display_name.text,\n", + " \"address\": response.formatted_address,\n", + " \"rating\": response.rating,\n", + " \"website\": response.website_uri,\n", + " \"latitude\": response.location.latitude,\n", + " \"longitude\": response.location.longitude\n", + " }\n", + " place_details_list.append(details)\n", + "\n", + " except Exception as e:\n", + " print(f\" - Could not retrieve details for Place ID {place_id}: {e}\")\n", + "\n", + " return place_details_list\n", + "\n", + "# --- Step 4: Run the Function and Display the Results ---\n", + "# In a Colab notebook, we can directly 'await' our async function.\n", + "place_details_results = await get_details_for_places(places_client, unique_place_ids)\n", + "\n", + "# Convert the list of dictionaries into a Pandas DataFrame for clean display.\n", + "if place_details_results:\n", + " details_df = pd.DataFrame(place_details_results)\n", + " print(\"\\n--- Place Details Results ---\")\n", + " display(details_df)\n", + "else:\n", + " print(\"\\nNo details were successfully retrieved.\")" + ], + "metadata": { + "id": "xPF80FqW1eTB" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "### **Creating a Combined Visualization for Final Analysis**\n", + "\n", + "This final cell synthesizes all our previous findings onto a single, layered, and interactive map for a comprehensive strategic overview. Here’s what it shows:\n", + "\n", + "* **Base Layer (Choropleth):** The map's foundation is our purple suitability choropleth, clearly highlighting the most commercially active zones in the city.\n", + "* **Top Layer (Markers):** Individual coffee cup icons are overlaid on the map, pinpointing the exact locations of existing coffee shops within our top-ranked suitability zones.\n", + "* **Interactivity:** You can explore the map by hovering over hexagons to see suitability data or by clicking on the coffee cup markers to see specific details for each shop, including its name, address, and rating.\n", + "\n", + "This layered map provides a powerful decision-making tool, allowing you to visually correlate high-potential areas with the current competitive landscape on the ground.\n", + "\n", + "**Note:** This cell uses 2D Map Tiles. Please review the documentation for pricing." + ], + "metadata": { + "id": "WieS8MlfvT3b" + } + }, + { + "cell_type": "code", + "source": [ + "import folium\n", + "\n", + "# --- WARNING: This cell reuses variables from a previous cell. ---\n", + "# Re-running the cell that creates 'google_tiles_url' is recommended for a fresh session token.\n", + "\n", + "try:\n", + " # Verify all required variables from previous steps exist.\n", + " _ = google_tiles_url\n", + " _ = google_attribution\n", + " _ = gdf_suitability\n", + " _ = details_df\n", + "except NameError as e:\n", + " raise NameError(f\"A required variable or DataFrame was not found. Please ensure all previous cells \"\n", + " f\"have been run successfully. Missing: {e}\")\n", + "\n", + "# --- Step 1: Create the Base Choropleth Map ---\n", + "# This is the same code as before, creating our purple suitability map.\n", + "# The map object is stored in the 'combined_map' variable.\n", + "print(\"Generating the base choropleth map of suitability scores...\")\n", + "\n", + "suitability_tooltip_cols = [\n", + " 'h3_index', 'normalized_suitability_score', 'suitability_score',\n", + " 'restaurant_count', 'casino_count', 'bar_count',\n", + " 'tourist_attraction_count', 'convenience_store_count'\n", + "]\n", + "\n", + "combined_map = gdf_suitability.explore(\n", + " column=\"normalized_suitability_score\",\n", + " cmap=\"Purples\",\n", + " scheme=\"NaturalBreaks\",\n", + " k=7,\n", + " tooltip=suitability_tooltip_cols,\n", + " popup=True,\n", + " tiles=google_tiles_url,\n", + " attr=google_attribution,\n", + " style_kwds={\"stroke\": True, \"color\": \"black\", \"weight\": 0.2, \"fillOpacity\": 0.65}\n", + ")\n", + "\n", + "# --- Step 2: Add Markers for Individual Coffee Shops ---\n", + "# Now, we loop through our DataFrame of coffee shop details and add a pin for each.\n", + "print(f\"Adding {len(details_df)} coffee shop markers to the map...\")\n", + "\n", + "for index, row in details_df.iterrows():\n", + " # Format the popup content with HTML for better readability.\n", + " popup_html = f\"\"\"\n", + " {row['name']}
\n", + " Address: {row['address']}
\n", + " Rating: {row['rating']}\n", + " \"\"\"\n", + "\n", + " # Create a Marker for the current coffee shop.\n", + " folium.Marker(\n", + " location=[row['latitude'], row['longitude']],\n", + " # The popup appears when you click the marker.\n", + " popup=folium.Popup(popup_html, max_width=300),\n", + " # The tooltip appears when you hover over the marker.\n", + " tooltip=row['name'],\n", + " # Use a custom icon to clearly represent a coffee shop.\n", + " icon=folium.Icon(color='green', icon='coffee', prefix='fa')\n", + " ).add_to(combined_map) # Add the marker to our map object.\n", + "\n", + "\n", + "# --- Step 3: Display the Final Combined Map ---\n", + "print(\"Displaying combined map...\")\n", + "display(combined_map)" + ], + "metadata": { + "id": "tG8jPc7X3DRa" + }, + "execution_count": null, + "outputs": [] + } + ] +} \ No newline at end of file From f9265aa7506171eff6e6e04d6cf8d01983de4252 Mon Sep 17 00:00:00 2001 From: henrikvalv3 Date: Thu, 25 Sep 2025 13:12:16 +0100 Subject: [PATCH 2/3] Spot check results notebook --- .../notebooks/spot_check_results/README.md | 35 ++ ...s_spot_check_results_using_functions.ipynb | 528 ++++++++++++++++++ 2 files changed, 563 insertions(+) create mode 100644 places_insights/notebooks/spot_check_results/README.md create mode 100644 places_insights/notebooks/spot_check_results/places_insights_spot_check_results_using_functions.ipynb diff --git a/places_insights/notebooks/spot_check_results/README.md b/places_insights/notebooks/spot_check_results/README.md new file mode 100644 index 0000000..ed1a300 --- /dev/null +++ b/places_insights/notebooks/spot_check_results/README.md @@ -0,0 +1,35 @@ +# Spot-Checking Places Insights Data with Functions and Sample Place IDs + +### Overall Goal + +This notebook demonstrates a workflow for spot-checking Places Insights data. It starts with a high-level statistical query to find restaurant density and then **directly visualizes both the high-level density and ground-truth sample locations from the city's busiest areas on a single, combined map.** + +### Key Technologies Used + +* **[Places Insights](https://developers.google.com/maps/documentation/placesinsights):** To provide the Places Data and Place Count Function. +* **[BigQuery](https://cloud.google.com/bigquery):** To run the `PLACES_COUNT_PER_H3` function, which provides aggregated place counts and `sample_place_ids`. +* **[Google Maps Place Details API](https://developers.google.com/maps/documentation/places/web-service/place-details):** To fetch rich, detailed information (name, address, rating, and a Google Maps link) for the specific `sample_place_ids`. +* **[Google Maps 2D Tiles](https://developers.google.com/maps/documentation/tile/2d-tiles-overview):** To use Google Maps as the basemap. +* **Python Libraries:** + * **[GeoPandas](https://geopandas.org/en/stable/)** for spatial data manipulation. + * **[Folium](https://python-visualization.github.io/folium/latest/)** for creating the final interactive, layered map. + +See [Google Maps Platform Pricing](https://mapsplatform.google.com/intl/en_uk/pricing/) For API costs assocated with running this notebook. + +### The Step-by-Step Workflow + +1. **Query Aggregated Data:** We begin by querying BigQuery to count all highly-rated, operational restaurants across London, grouping them into H3 hexagonal cells. This query provides the statistical foundation for our analysis and, crucially, a list of `sample_place_ids` for each cell. + +2. **Identify Hotspots & Fetch Details:** The notebook then **automatically** identifies the 20 busiest H3 cells. It consolidates the `sample_place_ids` from all of these top hotspots into a single master list and uses the Places API to fetch detailed information for each one. + +3. **Create a Combined Visualization:** In the final step, we generate a single, layered map. + * The **base layer** is a choropleth "heatmap" showing restaurant density across the entire city. + * The **top layer** displays individual pins for all the sample restaurants from the top 20 hotspots, providing a direct, ground-level view of the locations that make up the aggregated counts. Each pin's popup includes a link to open the location directly in Google Maps. + +### **How to Use This Notebook** + +1. ** Set Up Secrets:** Before you begin, you must configure two secrets in the Colab "Secrets" tab (the **🔑 key icon** on the left menu): + * `GCP_PROJECT`: Your Google Cloud Project ID with access to Places Insights. + * `GMP_API_KEY`: Your Google Maps Platform API key. Ensure the **Maps Tile API** is enabled for this key in your GCP console. + +2. **Run the Cells:** Once the secrets are set, simply run the cells in order from top to bottom. Each visualization will appear as the output of its corresponding code cell. \ No newline at end of file diff --git a/places_insights/notebooks/spot_check_results/places_insights_spot_check_results_using_functions.ipynb b/places_insights/notebooks/spot_check_results/places_insights_spot_check_results_using_functions.ipynb new file mode 100644 index 0000000..e8393d3 --- /dev/null +++ b/places_insights/notebooks/spot_check_results/places_insights_spot_check_results_using_functions.ipynb @@ -0,0 +1,528 @@ +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "provenance": [] + }, + "kernelspec": { + "name": "python3", + "display_name": "Python 3" + }, + "language_info": { + "name": "python" + } + }, + "cells": [ + { + "cell_type": "markdown", + "source": [ + "# Spot-Checking Places Insights Data with Functions and Sample Place IDs\n", + "\n", + "### Overall Goal\n", + "\n", + "This notebook demonstrates a workflow for spot-checking Places Insights data. It starts with a high-level statistical query to find restaurant density and then **directly visualizes both the high-level density and ground-truth sample locations from the city's busiest areas on a single, combined map.**\n", + "\n", + "### Key Technologies Used\n", + "\n", + "* **[Places Insights](https://developers.google.com/maps/documentation/placesinsights):** To provide the Places Data and Place Count Function.\n", + "* **[BigQuery](https://cloud.google.com/bigquery):** To run the `PLACES_COUNT_PER_H3` function, which provides aggregated place counts and `sample_place_ids`.\n", + "* **[Google Maps Place Details API](https://developers.google.com/maps/documentation/places/web-service/place-details):** To fetch rich, detailed information (name, address, rating, and a Google Maps link) for the specific `sample_place_ids`.\n", + "* **[Google Maps 2D Tiles](https://developers.google.com/maps/documentation/tile/2d-tiles-overview):** To use Google Maps as the basemap.\n", + "* **Python Libraries:**\n", + " * **[GeoPandas](https://geopandas.org/en/stable/)** for spatial data manipulation.\n", + " * **[Folium](https://python-visualization.github.io/folium/latest/)** for creating the final interactive, layered map.\n", + "\n", + "See [Google Maps Platform Pricing](https://mapsplatform.google.com/intl/en_uk/pricing/) For API costs assocated with running this notebook.\n", + "\n", + "### The Step-by-Step Workflow\n", + "\n", + "1. **Query Aggregated Data:** We begin by querying BigQuery to count all highly-rated, operational restaurants across London, grouping them into H3 hexagonal cells. This query provides the statistical foundation for our analysis and, crucially, a list of `sample_place_ids` for each cell.\n", + "\n", + "2. **Identify Hotspots & Fetch Details:** The notebook then **automatically** identifies the 20 busiest H3 cells. It consolidates the `sample_place_ids` from all of these top hotspots into a single master list and uses the Places API to fetch detailed information for each one.\n", + "\n", + "3. **Create a Combined Visualization:** In the final step, we generate a single, layered map.\n", + " * The **base layer** is a choropleth \"heatmap\" showing restaurant density across the entire city.\n", + " * The **top layer** displays individual pins for all the sample restaurants from the top 20 hotspots, providing a direct, ground-level view of the locations that make up the aggregated counts. Each pin's popup includes a link to open the location directly in Google Maps.\n", + "\n", + "### **How to Use This Notebook**\n", + "\n", + "1. ** Set Up Secrets:** Before you begin, you must configure two secrets in the Colab \"Secrets\" tab (the **🔑 key icon** on the left menu):\n", + " * `GCP_PROJECT`: Your Google Cloud Project ID with access to Places Insights.\n", + " * `GMP_API_KEY`: Your Google Maps Platform API key. Ensure the **Maps Tile API** is enabled for this key in your GCP console.\n", + "\n", + "2. **Run the Cells:** Once the secrets are set, simply run the cells in order from top to bottom. Each visualization will appear as the output of its corresponding code cell." + ], + "metadata": { + "id": "UlAo3mXQ_G_Z" + } + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true, + "id": "T7xSzI46psaW" + }, + "outputs": [], + "source": [ + "# Install necessary libraries\n", + "# We use folium and its ecosystem for mapping.\n", + "!pip install google-cloud-bigquery geopandas shapely folium mapclassify xyzservices google-maps-places googlemaps" + ] + }, + { + "cell_type": "code", + "source": [ + "# Import libraries\n", + "from google.cloud import bigquery\n", + "from google.colab import auth, userdata, data_table\n", + "from google.api_core import exceptions\n", + "\n", + "from google.maps import places_v1\n", + "\n", + "import requests\n", + "\n", + "import geopandas as gpd\n", + "import shapely\n", + "import sys\n", + "\n", + "import pandas as pd\n", + "\n", + "# Import the mapping libraries\n", + "import folium\n", + "import mapclassify # Used by .explore() for data classification\n", + "import xyzservices # Provides tile layers" + ], + "metadata": { + "id": "zA53W4ygpzZt" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "# Configure GCP Authentication\n", + "# This part securely gets your GCP Project ID.\n", + "GCP_PROJECT_SECRET_KEY_NAME = \"GCP_PROJECT\" #@param {type:\"string\"}\n", + "GCP_PROJECT_ID = None\n", + "\n", + "if \"google.colab\" in sys.modules:\n", + " try:\n", + " GCP_PROJECT_ID = userdata.get(GCP_PROJECT_SECRET_KEY_NAME)\n", + " if GCP_PROJECT_ID:\n", + " print(f\"Authenticating to GCP project: {GCP_PROJECT_ID}\")\n", + " auth.authenticate_user(project_id=GCP_PROJECT_ID)\n", + " else:\n", + " raise ValueError(f\"Could not retrieve GCP Project ID from secret named '{GCP_PROJECT_SECRET_KEY_NAME}'. \"\n", + " \"Please make sure the secret is set in your Colab environment.\")\n", + " except userdata.SecretNotFoundError:\n", + " raise ValueError(f\"Secret named '{GCP_PROJECT_SECRET_KEY_NAME}' not found. \"\n", + " \"Please create it in the 'Secrets' tab (key icon) in Colab.\")" + ], + "metadata": { + "id": "zZ6eEv6cp1pR" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "API_KEY_SECRET_NAME = \"GMP_API_KEY\" #@param {type:\"string\"}\n", + "\n", + "# Initialize a variable to hold our key.\n", + "gmp_api_key = None\n", + "\n", + "try:\n", + " # Attempt to retrieve the secret value using its name.\n", + " gmp_api_key = userdata.get(API_KEY_SECRET_NAME)\n", + " print(\"Successfully retrieved API key.\")\n", + "\n", + "except userdata.SecretNotFoundError:\n", + " raise ValueError(f\"Secret named '{API_KEY_SECRET_NAME}' not found. \"\n", + " \"Please create it in the 'Secrets' tab (key icon) in Colab.\")" + ], + "metadata": { + "id": "V2C7bTEQOW-6" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "# Enable interactive tables for pandas DataFrames\n", + "data_table.enable_dataframe_formatter()\n", + "client = bigquery.Client(project=GCP_PROJECT_ID)" + ], + "metadata": { + "id": "b5-gJNwcp9fK" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "restaurants_in_london_sql = \"\"\"\n", + "-- Declare a variable to hold the GEOGRAPHY for London.\n", + "DECLARE london_boundary GEOGRAPHY;\n", + "-- Set the variable by dynamically loading the boundary\n", + "-- from the Overture Maps public dataset.\n", + "SET london_boundary = (\n", + " SELECT geometry\n", + " FROM `bigquery-public-data.overture_maps.division_area`\n", + " WHERE names.primary = 'London' AND country = 'GB' LIMIT 1\n", + ");\n", + "-- Call the function with all parameters in a single JSON_OBJECT.\n", + "SELECT *\n", + "FROM\n", + " `places_insights___gb.PLACES_COUNT_PER_H3`(\n", + " JSON_OBJECT(\n", + " -- Define the search area\n", + " 'geography', london_boundary,\n", + " -- Set the aggregation grid size and other filters\n", + " 'h3_resolution', 8,\n", + " 'types', ['restaurant'],\n", + " 'business_status', ['OPERATIONAL'],\n", + " 'min_rating', 3.5,\n", + " -- NEW FILTER: Only include places with 100 or more user ratings.\n", + " 'min_user_rating_count', 100\n", + " )\n", + " )\n", + "ORDER BY\n", + " count DESC;\n", + "\"\"\"" + ], + "metadata": { + "id": "K-GzCtZIqERI" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "# Step 1.2: Execute Query and Create GeoDataFrame\n", + "print(\"Running london query...\")\n", + "df_restaurants_in_london = client.query(restaurants_in_london_sql).to_dataframe()\n", + "\n", + "df_restaurants_in_london['geography'] = df_restaurants_in_london['geography'].dropna().apply(shapely.from_wkt)\n", + "gdf_restaurants_in_london = gpd.GeoDataFrame(df_restaurants_in_london, geometry='geography', crs='EPSG:4326')\n", + "print(f\"Successfully processed {len(gdf_restaurants_in_london)} cells.\")" + ], + "metadata": { + "id": "XqtsN40tqRNl" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "### Visualizing Restaurant Density on a Heatmap\n", + "\n", + "The code below uses the GeoDataFrame to generate a heatmap. Here's what it shows:\n", + "\n", + "* **Choropleth Map:** Each H3 hexagon on the map is colored based on the number of operational restaurants it contains.\n", + "* **Color Scale:** The map uses a yellow-to-red color scale (`YlOrRd`). Lighter, yellow areas have fewer restaurants, while darker, red areas represent the densest hotspots.\n", + "* **Interactivity:** You can hover over any hexagon to see its unique H3 index and the exact restaurant count. Clicking on a hexagon will open a popup with all the data for that cell.\n", + "\n", + "This visualization gives us an immediate and intuitive understanding of where the major dining hubs are located throughout the city.\n", + "\n", + "**Note:** This cell uses [2D Map Tiles](https://developers.google.com/maps/documentation/tile/2d-tiles-overview). Please review the documentation for pricing." + ], + "metadata": { + "id": "sARRaIAY704E" + } + }, + { + "cell_type": "code", + "source": [ + "# Define the columns from your GeoDataFrame that you want to see in the tooltip\n", + "restaurant_tooltip_cols = [\n", + " 'h3_cell_index',\n", + " 'count'\n", + "]\n", + "\n", + "# Verify the GMP API key exists.\n", + "if 'gmp_api_key' not in locals() or gmp_api_key is None:\n", + " raise NameError(\"The 'gmp_api_key' variable is not defined. Please run the API key cell first.\")\n", + "\n", + "\n", + "# Get Session Token\n", + "session_url = f\"https://tile.googleapis.com/v1/createSession?key={gmp_api_key}\"\n", + "payload = {\"mapType\": \"roadmap\", \"language\": \"en-US\", \"region\": \"US\"}\n", + "headers = {\"Content-Type\": \"application/json\"}\n", + "\n", + "response_session = requests.post(session_url, json=payload, headers=headers)\n", + "response_session.raise_for_status()\n", + "session_data = response_session.json()\n", + "session_token = session_data['session']\n", + "\n", + "\n", + "# Get Dynamic Attribution from Viewport API\n", + "# We need to define a bounding box for the viewport request.\n", + "# We'll use the total bounds of our GeoDataFrame.\n", + "bounds = gdf_restaurants_in_london.total_bounds\n", + "viewport_url = (\n", + " f\"https://tile.googleapis.com/tile/v1/viewport?key={gmp_api_key}\"\n", + " f\"&session={session_token}\"\n", + " f\"&zoom=10\"\n", + " f\"&north={bounds[3]}&south={bounds[1]}\"\n", + " f\"&west={bounds[0]}&east={bounds[2]}\"\n", + ")\n", + "\n", + "response_viewport = requests.get(viewport_url)\n", + "response_viewport.raise_for_status()\n", + "viewport_data = response_viewport.json()\n", + "\n", + "# Extract the mandatory copyright/attribution string.\n", + "google_attribution = viewport_data.get('copyright', 'Google') # Fallback to 'Google'\n", + "\n", + "# Construct Tile URL and Display Map\n", + "google_tiles = f\"https://tile.googleapis.com/v1/2dtiles/{{z}}/{{x}}/{{y}}?session={session_token}&key={gmp_api_key}\"\n", + "\n", + "# Create the map using the .explore() function on your GeoDataFrame\n", + "# This will create a choropleth map where the color of each H3 cell\n", + "# is based on the number of restaurants it contains.\n", + "london_restaurants_map = gdf_restaurants_in_london.explore(\n", + " column=\"count\", # The column to color the map by\n", + " cmap=\"YlOrRd\", # A color map that's great for density (Yellow-Orange-Red)\n", + " scheme=\"NaturalBreaks\", # A smart way to group data into color buckets\n", + " tooltip=restaurant_tooltip_cols, # The columns to show when you hover\n", + " popup=True, # Show a popup with all data on click\n", + " tiles=google_tiles,\n", + " attr=google_attribution,\n", + " style_kwds={\"stroke\": True, \"color\": \"black\", \"weight\": 0.2, \"fillOpacity\": 0.7} # Styling for the hexagons\n", + ")\n", + "\n", + "# Display the map\n", + "display(london_restaurants_map)" + ], + "metadata": { + "id": "XRWrxqu6rQGn" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "### Identify Top Hotspots and Consolidate Place IDs\n", + "\n", + "Now that we have the density data for all of London, we will focus our analysis on the 20 busiest areas.\n", + "\n", + "The code below isolates these top 20 H3 cells and extracts up to 10 `sample_place_ids` from each one. It then displays a summary table of these hotspots before consolidating all the IDs into a single master list for analysis. This list will be used in the next step to fetch detailed information for each location." + ], + "metadata": { + "id": "okmimHojzTHf" + } + }, + { + "cell_type": "code", + "source": [ + "# Ensure Colab's interactive data table formatter is enabled.\n", + "data_table.enable_dataframe_formatter()\n", + "\n", + "# Isolate the top 20 H3 cells with the highest restaurant counts.\n", + "print(f\"Identifying the top 20 busiest H3 cells from the {len(gdf_restaurants_in_london)} total cells...\")\n", + "top_20_cells_df = gdf_restaurants_in_london.sort_values(by='count', ascending=False).head(20).reset_index(drop=True)\n", + "print(\"Top 20 cells identified.\")\n", + "\n", + "# For each of the top 20 cells, take the first 10 sample_place_ids.\n", + "# The .apply() method performs this slicing operation on each row's list individually.\n", + "print(\"Extracting up to 10 sample Place IDs from each of the top 20 cells...\")\n", + "sliced_ids_series = top_20_cells_df['sample_place_ids'].apply(lambda id_list: id_list[:10])\n", + "\n", + "# --- Create and display the summary table ---\n", + "print(\"Generating summary table of top hotspots...\")\n", + "summary_df = pd.DataFrame({\n", + " 'H3 Cell Index': top_20_cells_df['h3_cell_index'],\n", + " 'Total Places in Cell': top_20_cells_df['count'],\n", + " 'Sample IDs to Analyze': sliced_ids_series.apply(len)\n", + "})\n", + "display(summary_df)\n", + "\n", + "\n", + "# Consolidate all the sliced lists into a single series.\n", + "# The .explode() function creates a new row for each Place ID.\n", + "all_place_ids_series = sliced_ids_series.explode()\n", + "\n", + "# Get a final list of unique Place IDs to process.\n", + "place_ids_to_process = all_place_ids_series.unique().tolist()\n", + "\n", + "print(f\"\\nConsolidated a total of {len(place_ids_to_process)} unique sample Place IDs to analyze.\")\n", + "\n", + "# Display the first 5 IDs as a sample\n", + "print(\"\\nSample of Place IDs to be processed:\")\n", + "print(place_ids_to_process[:5])" + ], + "metadata": { + "id": "DPIMj07U0Bzv" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "### Fetch Details for Consolidated Place IDs\n", + "\n", + "With our consolidated list of unique Place IDs, we now use the Google Maps Places API to fetch rich, detailed information for each location.\n", + "\n", + "The script below will loop through each ID and retrieve its name, address, user rating, and latitude/longitude coordinates. All of this information is then compiled into a single DataFrame, `details_df`, which will power our final, combined map visualization.\n", + "\n", + "**Note:** This cell uses [Place Details API](https://developers.google.com/maps/documentation/places/web-service/place-details). Please review the documentation for pricing." + ], + "metadata": { + "id": "Xe-jCDMm1pwx" + } + }, + { + "cell_type": "code", + "source": [ + "# Ensure Colab's interactive data table formatter is enabled.\n", + "data_table.enable_dataframe_formatter()\n", + "\n", + "# Check if the list of Place IDs from Phase 1 exists.\n", + "if 'place_ids_to_process' in locals() and place_ids_to_process:\n", + "\n", + " places_client = places_v1.PlacesClient()\n", + " if places_client:\n", + " # Loop through the list of Place IDs and fetch details.\n", + " place_details_list = []\n", + " # Add 'googleMapsUri' to the list of fields we are requesting.\n", + " fields_to_request = \"displayName,formattedAddress,rating,userRatingCount,location,googleMapsUri\"\n", + "\n", + " total_ids = len(place_ids_to_process)\n", + " print(f\"\\nFetching details for {total_ids} unique Place IDs...\")\n", + "\n", + " for i, place_id in enumerate(place_ids_to_process):\n", + " if i > 0 and i % 50 == 0:\n", + " print(f\" ...processed {i} of {total_ids} IDs.\")\n", + "\n", + " try:\n", + " request = {\"name\": f\"places/{place_id}\"}\n", + " response = places_client.get_place(\n", + " request=request,\n", + " metadata=[(\"x-goog-fieldmask\", fields_to_request)]\n", + " )\n", + "\n", + " place_details_list.append({\n", + " \"Name\": response.display_name.text,\n", + " \"Address\": response.formatted_address,\n", + " \"Rating\": response.rating,\n", + " \"Total Ratings\": response.user_rating_count,\n", + " \"Place ID\": place_id,\n", + " \"Latitude\": response.location.latitude,\n", + " \"Longitude\": response.location.longitude,\n", + " # Add the new URI field to our collected data.\n", + " \"Google Maps URI\": response.google_maps_uri\n", + " })\n", + " except exceptions.GoogleAPICallError as e:\n", + " print(f\" - Warning: Could not fetch details for Place ID '{place_id}': {e.message}\")\n", + "\n", + " # Convert the list of details into a pandas DataFrame.\n", + " if place_details_list:\n", + " print(f\"\\nSuccessfully fetched details for {len(place_details_list)} places.\")\n", + " details_df = pd.DataFrame(place_details_list)\n", + "\n", + " # Define which columns we want to show in the summary table.\n", + " columns_to_display = [\"Name\", \"Address\", \"Rating\", \"Total Ratings\", \"Place ID\"]\n", + "\n", + " print(\"Here is a sample of the retrieved data (Google Maps URI is hidden):\")\n", + " # Display only the selected columns from the head of the DataFrame.\n", + " display(details_df[columns_to_display].head())\n", + " else:\n", + " print(\"\\nCould not fetch details for any of the sample Place IDs.\")\n", + " details_df = pd.DataFrame()\n", + "\n", + "else:\n", + " print(\"The 'place_ids_to_process' list does not exist or is empty. \"\n", + " \"Please run the previous cell to generate the list of IDs first.\")" + ], + "metadata": { + "id": "wcSoHDb81zE_" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "### Create the Combined Map\n", + "\n", + "This is the final step where we bring all our analysis together into a single visualization.\n", + "\n", + "The code below first creates the restaurant density heatmap, coloring each H3 cell based on the number of qualifying restaurants. Then, it iterates through the restaurant data we fetched in the previous step and overlays a pin for each restaurant onto the map.\n", + "\n", + "The result is a layered map that shows both the high-level \"hotspots\" and the ground-truth, individual places that make up those dense areas.\n", + "\n", + "**Note:** This cell uses [2D Map Tiles](https://developers.google.com/maps/documentation/tile/2d-tiles-overview). Please review the documentation for pricing." + ], + "metadata": { + "id": "eFE1gip-8fUd" + } + }, + { + "cell_type": "code", + "source": [ + "# Check if the required DataFrames from previous steps exist.\n", + "if 'gdf_restaurants_in_london' in locals() and not gdf_restaurants_in_london.empty and 'details_df' in locals() and not details_df.empty:\n", + "\n", + " restaurant_tooltip_cols = ['h3_cell_index', 'count']\n", + "\n", + " # Create the base choropleth map from the H3 cell data.\n", + " print(\"Generating base choropleth map of restaurant density...\")\n", + " combined_map = gdf_restaurants_in_london.explore(\n", + " column=\"count\",\n", + " cmap=\"YlOrRd\",\n", + " scheme=\"NaturalBreaks\",\n", + " tooltip=restaurant_tooltip_cols,\n", + " popup=True,\n", + " tiles=google_tiles,\n", + " attr=google_attribution,\n", + " style_kwds={\"stroke\": True, \"color\": \"black\", \"weight\": 0.2, \"fillOpacity\": 0.7}\n", + " )\n", + "\n", + " # Iterate through the detailed restaurant data and add a marker for each one.\n", + " print(f\"Adding {len(details_df)} individual restaurant markers with Google Maps links to the map...\")\n", + " for index, row in details_df.iterrows():\n", + "\n", + " # Create a popup with place details.\n", + " popup_html = f\"\"\"\n", + " {row['Name']}
\n", + " Rating: {row['Rating']} ({row['Total Ratings']} reviews)
\n", + "
\n", + " {row['Address']}

\n", + " View on Google Maps\n", + " \"\"\"\n", + "\n", + " # Create the marker and add it to our existing map object.\n", + " folium.Marker(\n", + " location=[row['Latitude'], row['Longitude']],\n", + " tooltip=row['Name'],\n", + " popup=folium.Popup(popup_html, max_width=300),\n", + " icon=folium.Icon(color='blue', icon='utensils', prefix='fa') # Blue icon to contrast with red map\n", + " ).add_to(combined_map)\n", + "\n", + " print(\"Map layers combined successfully. Displaying below.\")\n", + "\n", + " # Display the final, combined map.\n", + " display(combined_map)\n", + "\n", + "else:\n", + " print(\"One or more required DataFrames ('gdf_restaurants_in_london', 'details_df') do not exist or are empty.\")\n", + " print(\"Please ensure you have run the previous cells successfully before this one.\")" + ], + "metadata": { + "id": "qCM--0WL8dAj" + }, + "execution_count": null, + "outputs": [] + } + ] +} \ No newline at end of file From 932a2b83fcd66e92cda8dc5c23223e61eac1d79b Mon Sep 17 00:00:00 2001 From: henrikvalv3 Date: Mon, 6 Oct 2025 16:44:06 +0100 Subject: [PATCH 3/3] Improve Places data handling --- ...s_spot_check_results_using_functions.ipynb | 52 +++++++++++++------ 1 file changed, 36 insertions(+), 16 deletions(-) diff --git a/places_insights/notebooks/spot_check_results/places_insights_spot_check_results_using_functions.ipynb b/places_insights/notebooks/spot_check_results/places_insights_spot_check_results_using_functions.ipynb index e8393d3..ab68f96 100644 --- a/places_insights/notebooks/spot_check_results/places_insights_spot_check_results_using_functions.ipynb +++ b/places_insights/notebooks/spot_check_results/places_insights_spot_check_results_using_functions.ipynb @@ -475,8 +475,8 @@ "\n", " restaurant_tooltip_cols = ['h3_cell_index', 'count']\n", "\n", - " # Create the base choropleth map from the H3 cell data.\n", - " print(\"Generating base choropleth map of restaurant density...\")\n", + " # Create the base choropleth map from the H3 cell data using GeoPandas .explore()\n", + " print(\"Generating base choroplep map of restaurant density...\")\n", " combined_map = gdf_restaurants_in_london.explore(\n", " column=\"count\",\n", " cmap=\"YlOrRd\",\n", @@ -488,30 +488,50 @@ " style_kwds={\"stroke\": True, \"color\": \"black\", \"weight\": 0.2, \"fillOpacity\": 0.7}\n", " )\n", "\n", - " # Iterate through the detailed restaurant data and add a marker for each one.\n", + " # Iterate through the detailed restaurant data to add a marker for each one.\n", " print(f\"Adding {len(details_df)} individual restaurant markers with Google Maps links to the map...\")\n", - " for index, row in details_df.iterrows():\n", + " skipped_count = 0\n", "\n", - " # Create a popup with place details.\n", + " for index, row in details_df.iterrows():\n", + " # Defensively validate that coordinates exist for the record.\n", + " lat = row['Latitude']\n", + " lon = row['Longitude']\n", + " if pd.isna(lat) or pd.isna(lon):\n", + " skipped_count += 1\n", + " continue # Skip this record if coordinates are missing.\n", + "\n", + " # Clean and sanitize all data that will be used in the popup.\n", + " name = str(row['Name']) if pd.notna(row['Name']) else \"Unnamed Place\"\n", + " rating = row['Rating'] if pd.notna(row['Rating']) else \"N/A\"\n", + " total_ratings = int(row['Total Ratings']) if pd.notna(row['Total Ratings']) else 0\n", + " address = str(row['Address']) if pd.notna(row['Address']) else \"No Address Provided\"\n", + " uri = str(row['Google Maps URI']) if pd.notna(row['Google Maps URI']) else \"#\"\n", + " name = name.replace('`', \"'\")\n", + "\n", + " # Create the full HTML content for the marker's popup.\n", " popup_html = f\"\"\"\n", - " {row['Name']}
\n", - " Rating: {row['Rating']} ({row['Total Ratings']} reviews)
\n", + " {name}
\n", + " Rating: {rating} ({total_ratings} reviews)
\n", "
\n", - " {row['Address']}

\n", - " View on Google Maps\n", + " {address}

\n", + " View on Google Maps\n", " \"\"\"\n", "\n", - " # Create the marker and add it to our existing map object.\n", + " popup = folium.Popup(popup_html, max_width=300)\n", + "\n", + " # Create the marker and add it to the existing map object.\n", " folium.Marker(\n", - " location=[row['Latitude'], row['Longitude']],\n", - " tooltip=row['Name'],\n", - " popup=folium.Popup(popup_html, max_width=300),\n", - " icon=folium.Icon(color='blue', icon='utensils', prefix='fa') # Blue icon to contrast with red map\n", + " location=[lat, lon],\n", + " tooltip=name,\n", + " popup=popup,\n", + " icon=folium.Icon(color='blue', icon='utensils', prefix='fa')\n", " ).add_to(combined_map)\n", "\n", - " print(\"Map layers combined successfully. Displaying below.\")\n", + " # Provide a summary message if any records were skipped.\n", + " if skipped_count > 0:\n", + " print(f\"\\nWarning: Skipped {skipped_count} marker(s) due to missing coordinates.\")\n", "\n", - " # Display the final, combined map.\n", + " print(\"Map layers combined successfully. Displaying below.\")\n", " display(combined_map)\n", "\n", "else:\n",