← Dashboard
Engineering Case Study

Dashboard Solution
Description

A self-updating personal geolocation intelligence dashboard built on a fully static architecture — no server, no database, no runtime. From a Foursquare check-in to a live deployed dashboard in under 5 minutes.

What Was Built

A personal analytics dashboard that ingests every Foursquare/Swarm check-in and renders it into ten pages: a main analytics dashboard, a trip journal with per-trip maps, a companions tracker, a full check-in feed with historical weather, a tips explorer with country/city tabs, closed/deleted-venue badges, view counts, and filter buttons, a venue loyalty explorer, a world cities map, a full-text search page, a photo gallery with 21 000+ images, a stats overview, and this engineering write-up — all committed to git and served via Cloudflare Pages CDN.

Beyond rendering, the system maintains a data integrity pipeline: a manual archive workflow snapshots the full check-in history, diffs it against the previous snapshot to detect renamed or moved venues, and propagates those changes into the tips dataset — all without extra API calls. Duplicate rows and check-ins that the API silently stops returning are detected on every full re-fetch and accumulated in an incremental anomaly log (checkins_anomalies.json), giving a permanent auditable history of every data quality event.

The entire system runs on free-tier infrastructure: Cloudflare Pages, Cloudflare Workers, Cloudflare KV, and GitHub Actions. There is no backend, no database, no containers. The build pipeline is 100% Python 3.9+, the frontend is vanilla JS with Leaflet and Chart.js.

~5 min
Check-in → Live Deploy
10
Generated HTML Pages
3-layer
City Normalisation
$0
Monthly Infrastructure Cost
1 min
Polling Interval
The Deployment Pipeline

The system is designed around a push-on-change philosophy: nothing runs unless there is new data. A Cloudflare Worker acts as the real-time sensor, keeping the whole pipeline reactive while consuming negligible resources when idle.

Trigger
Swarm Check-in
T+0
Cloudflare Worker
KV Timestamp Poll
T+0–60s
GitHub API
workflow_dispatch
T+1 min
GitHub Actions
Fetch → Build → Push
T+1–3 min
Cloudflare Pages
CDN Deploy
T+4–5 min

The Worker runs on a 1-minute cron trigger, fetching the most recent check-in from the Foursquare API and comparing its Unix timestamp against the last-seen value stored in Cloudflare KV. If the timestamp is newer, the Worker writes the new value to KV (idempotent — prevents double-triggering on retries) and fires a workflow_dispatch event to GitHub Actions via the REST API.

GitHub Actions then fetches the updated check-in data from the private data repository, rebuilds all ten HTML pages, commits and pushes to main. Cloudflare Pages detects the push and deploys automatically — no build command needed, since the HTML is already pre-built.

Challenges & Solutions
Challenge 01 — Real-Time vs. Static
Personal dashboards typically require a backend to serve dynamic data on request. Running a server 24/7 for occasional check-in reads is wasteful and introduces ongoing cost and maintenance burden.
Solution — Pre-built Static with Event-Driven Rebuild
All HTML is generated at build time and committed directly to the git repository. Cloudflare Pages serves these files from a global CDN with zero cold-start latency. The "backend" is replaced by a 63-line Cloudflare Worker that only activates when a new check-in is detected — consuming microseconds of CPU per minute versus a server running continuously.
Challenge 02 — Data Quality from Third-Party API
Foursquare returns city and country names inconsistently: Cyrillic spellings (Минск), transliterations (Minski Rayon), district sub-names that don't match any canonical city, and outright wrong country assignments for check-ins near border crossings.
Solution — Three-Layer Normalisation Pipeline
A layered override system processes every check-in through three passes, each with higher precedence than the last:
# Layer 1 — Bulk rename (city_merge.yaml) "Минск""Minsk" "Minski Rayon""Minsk" # Layer 2 — Per-timestamp city override (city_fixes.json) "1706738400""Brest" # border crossing mis-tagged # Layer 3 — Per-timestamp country override (country_fixes.json) "1706738400""Belarus" # Foursquare returned Poland
This gives operators surgical control: bulk rules handle the common case, while per-timestamp overrides handle specific data anomalies without touching the raw API data.
Challenge 03 — Timezone Correctness for DST-Free Countries
Resolving timezone from lat/lng coordinates (via timezonefinder) fails for countries that don't observe Daylight Saving Time. For example, Belarus geographically maps to UTC+2 in summer, but politically observes UTC+3 year-round. This causes check-in local times to be off by one hour for half the year.
Solution — Explicit Country → IANA Timezone Dict
metrics.py maintains a _COUNTRY_TZ dictionary mapping country names to authoritative IANA timezone IDs. This takes precedence over the coordinate-based lookup. Europe/Minsk is always UTC+3, regardless of what the geometric timezone boundary says — because that's what the clocks actually show.
Challenge 04 — Weather Data Date Mismatch
The recent check-ins feed shows historical weather for each location. Check-ins made in the early hours of the morning in UTC+3 timezones (e.g., 01:14 Minsk time) correspond to 22:14 UTC the previous day — but the dashboard was requesting weather using the local date ("March 12"), not the UTC date ("March 11"). The archive API had no data for March 12 yet, so weather was silently absent.
Solution — Derive Date from Unix Timestamp via UTC
All Open Meteo archive API requests now derive the date from the raw Unix timestamp:
// Before (broken): uses pre-computed local date field const url = `...&start_date=${r.date}...&timezone=${r.tz_name}`; // After (correct): derive UTC date from timestamp const utcDate = new Date(r.ts * 1000).toISOString().slice(0, 10); const url = `...&start_date=${utcDate}...&timezone=auto`;
The fix aligned the dashboard's weather fetch strategy with the feed page, which had always used the UTC-based approach. One source of truth, zero silent failures.
Challenge 05 — Private Data in a Public Repository
The raw checkins.csv contains full location history: GPS coordinates, venue IDs, timestamps, and companion names across years of travel. Committing this to a public repo exposes structured personal data to anyone crawling GitHub.
Solution — Separated Private Data Repository
The CSV lives in a separate private GitHub repository (foursquare-data). A fine-grained Personal Access Token scoped exclusively to that repository allows GitHub Actions to check it out at build time. The public repository never sees the raw CSV — only the generated HTML output, which contains only the aggregated, display-ready data already visible on the site. The PAT has Contents: read/write and nothing else.
Challenge 06 — City Name Collisions Across Continents
"Rabat" is the capital of Morocco and also a city in Malta. When rendering the world cities map, a naive name-match would mark Morocco's Rabat as visited when only Malta's was, or vice versa — silently inflating the visited cities count.
Solution — Continent-Aware City Matching
Both index.html.tmpl and gen_worldcities.py implement a CTRY_CONT JavaScript dictionary mapping every country to its continent. The matchVisited() function rejects a world-cities database match unless the candidate city's country falls on the same continent as the visited check-in's country. This guard is maintained in sync across both files — a documented invariant noted in CLAUDE.md.
Challenge 07 — Automatically Detecting Trip Boundaries
A raw check-in stream has no concept of "trip". A journey from home to another city and back is just a sequence of venue visits — with no explicit departure event, no arrival marker, and often a trail of in-transit check-ins (train stations, fuel stops, border crossings) that belong to the trip but fall before the first non-home city check-in and after the last. A naïve "city ≠ home" window misses the journey itself and produces ragged trip boundaries.
Solution — Multi-Pass Extension Pipeline
metrics.py runs an 8-pass pipeline over each candidate trip window to progressively widen its boundaries until they reflect the actual journey:
# Pass 1 — Seed: consecutive non-home city rows (≥ min_checkins) # Pass 2 — Same-day departure: scan backward for Transportation/Bus/Parking # on departure date; fuel stops absorbed if nearest to seed # Pass 3 — Arrival hub: scan forward for train station / airport on return day # Pass 4 — Intermediate home-city rows absorbed if flanked by trip cities # Pass 5 — Repeat passes 2–4 until stable (up to 5 iterations) # Pass 6 — trip_start_overrides: prepend rows from a specific timestamp # Pass 7 — Bicycle extension: 4 h window, earliest passthrough category # as anchor (Road, Bridge, Park, Bike Rental…), include all # intervening rows between anchor and trip seed # Pass 8 — Neighbourhood fallback: absorb a single home-city neighbourhood # check-in as arrival if no transport hub was found
Airport detection uses a substring match ("airport" in cat.lower()) rather than an exact string, catching all variants emitted by Foursquare: International Airport, Airport Terminal, Airport Gate, Airport Service. A prev_end_idx guard prevents the backward scan from crossing into a preceding trip's arrival rows. Where the heuristics fall short, three JSON config files provide surgical overrides: trip_start_overrides and trip_end_overrides pin exact boundary timestamps, and trip_names.json / trip_tags.json attach human-readable names and activity tags (bicycle, camping, etc.) keyed by the final resolved start timestamp.
Challenge 08 — Tips on Closed Venues Are Silently Omitted
Foursquare's /users/self/tips endpoint silently omits any tip written on a venue that has since been closed or deleted. With no error, no flag, and no indication in the response, a full fetch of 1 782 tips appeared complete — until a per-venue sweep revealed 25 additional tips that existed only on closed venues. A secondary problem: the API returns country names in the local language (Беларусь, Republica Moldova, المغرب…) rather than a consistent English form, breaking grouping, flag lookup, and tab rendering.
A third gap emerged when cross-checking against a Foursquare data export (downloaded from account settings): the export contained 1 912 tips versus 1 807 in the API-sourced file — a delta of 104 missing tips, plus a viewCount field entirely absent from the API response. The export also revealed that presence in checkins.csv is not a reliable proxy for venue activity: a venue can appear in historical check-ins and still be closed on Foursquare today. Determining true closed/deleted status required fetching each venue page individually.
Solution — Two-Strategy Fetch + Data Export Cross-Check + Page Verification
The tips pipeline uses two complementary strategies that together achieve API completeness:
# Strategy 1 — users endpoint (fast, ~1 800 tips in one paginated sweep) GET /v2/users/self/tips?limit=500&offset=0 # silent about closed venues # Strategy 2 — per-venue sweep (catches closed-venue tips) for venue_id in checkins_csv: if venue_id not in existing_tips: GET /v2/venues/{venue_id}/tips?filter=self # reveals closed-venue tips if new_tip_found: tip["closed"] = True # auto-marked as closed
Any tip discovered exclusively via the sweep — not returned by the users endpoint — is automatically marked closed=True in tips.json. The 25 pre-existing sweep tips were identified retroactively by comparing the initial 1 782-tip commit in the data repo against HEAD (1 807 tips); the 25-ID delta was patched directly.
For the 104 export-only tips, closed status could not be inferred from whether the venue_id appeared in the users endpoint response — a venue can have other active tips and still be closed. The only reliable source was the venue page itself. All 100 unique venue IDs were fetched via foursquare.com/v/{id} with browser session cookies (the public page embeds "closed":true in its __NEXT_DATA__ JSON or in the raw HTML for closed venues). 95 of 100 venues were confirmed closed on-page; the remaining 5 loaded on the legacy app.foursquare.com renderer with no closed marker, indicating they are still active — and their tips were found to have been deleted by moderators rather than lost to venue closure.
# Closed/deleted status determination for export-only tips for venue_id in new_tip_venue_ids: html = GET foursquare.com/v/{venue_id} # with browser cookies if "closed":true in html or __NEXT_DATA__["closed"]: tip["closed"] = True # 99 tips — venue confirmed closed elif loads_on_legacy_renderer: tip["deleted"] = True # 5 tips — venue active, tip deleted by mod
The export also provided viewCount for all tips — a field the API never returns. fetch_tips.py was updated to capture viewCount going forward; historical counts from the export were backfilled into tips.json in one pass. View counts are refreshed on each full re-fetch (--full); incremental runs only touch tips newer than the latest known timestamp.
Country name normalisation uses a CTRY_NORM dictionary in gen_tips.py mapping every local-language variant to its English form. City names reuse the existing city_merge.yaml pipeline. Both normalised values are stored as nc (country) and nci (city) on each tip record and propagated into the recent-30 tips slice embedded in index.html. Tip cards display a red CLOSED badge, a purple DELETED badge, and a 👁 view count in the footer. Three dedicated filter buttons — By Date, Closed only, and Deleted only — let the reader surface each data quality category directly.
Challenge 09 — Venue Metadata Drifts Silently Over Time
Foursquare venues can be renamed, moved, or recategorised at any time. A check-in stored in checkins.csv three years ago may now point to a venue with a different name, city, or coordinates — and tips.json, which duplicates that venue metadata per tip, can drift out of sync independently. Additionally, full re-fetches occasionally surface a second problem: some check-ins that existed in the old CSV are simply absent from the API response (deleted or merged venues), while other rows appear duplicated within the historical data with no indication of the double-entry. Both silent drift and silent data loss are impossible to detect without an explicit comparison.
Solution — Snapshot Diff + Incremental Anomaly Log
The Archive check-in snapshot GitHub Actions workflow (manual trigger) addresses both problems in a single run:
# 1. Archive current CSV with UTC timestamp before overwriting cp checkins.csv archive/checkins_2026-03-26T12-00-00Z.csv # 2. Full re-fetch via beforeTimestamp pagination (no API cap) python fetch_checkins.py --full # quota-safe: saves partial on 403 # 3. Diff old vs new — detect venue renames, moves, category changes python sync_venue_changes.py \ --old archive/checkins_${ARCHIVE_NAME} \ --new checkins.csv \ --tips tips.json # patches tips in-place, no extra API calls
sync_venue_changes.py compares the two snapshots on six fields per venue_id: venue, city, country, lat, lng, category. For each changed venue it patches every matching tip in tips.json in-place — converting lat/lng to float rounded to 5 dp to match the tips schema — and logs a numbered summary of every updated tip. The patched tips.json is committed alongside the fresh CSV in the same atomic commit, keeping both files permanently in sync without any additional API quota.
The full re-fetch also runs two anomaly detectors and accumulates their results incrementally in checkins_anomalies.json:
# checkins_anomalies.json — grows on every full re-fetch { "duplicates": [ // rows with identical (venue_id, date) in the CSV { "date": "1384169327", "venue": "Суперпрод", "city": "Minsk", ... } ], "missing": [ // rows present in CSV but absent from API response { "date": "1625992035", "venue": "Chashmayi Ayyub", "city": "Bukhara", ... } ] }
Duplicates are rows whose (venue_id, date) key appears more than once — identical double-entries from early Swarm usage. They are intentionally preserved in the CSV rather than silently removed; the anomaly file provides visibility without data loss. A duplicate_checkins.csv sidecar is also written for direct inspection. Missing rows are check-ins present in the existing CSV but absent from the API response — venues that Foursquare deleted or merged. These too are preserved and recorded so the count discrepancy is explained and auditable. Both lists accumulate across runs: new entries are merged in, existing entries are never removed, giving a permanent history of every data quality event the re-fetch has ever observed.
Challenge 10 — Recovering Companion Overlap Data from a Deprecated API
Foursquare's overlaps field — people who independently checked in at the same venue at the same time, distinct from explicit with companions — was only ever available via the legacy /v2/checkins/{id} endpoint and is no longer surfaced through any current API. With ~65 000 historical check-ins, a one-time enrichment run was the only way to recover this data before it disappeared entirely.
Three compounding problems made the enrichment non-trivial:
1. Quota exhaustion mid-run. At 0.35 s per call the script consumed the daily quota mid-run. The original error handler treated all HTTP 403 responses identically — marking the row as a permanent skip ("-") — so quota-exhausted rows were silently discarded alongside genuinely inaccessible ones, with no way to tell them apart after the fact.
2. Overlaps duplicating with companions. For a subset of check-ins Foursquare returned the explicit with companion in the overlaps list as well, producing duplicate entries across overlaps_name and with_name / created_by_name. 408 rows were affected.
3. Crash recovery without reprocessing 51 000 rows. A mid-run crash after ~7 400 rows left 1 089 FOUND entries in the terminal log but no record of which check-in IDs they corresponded to. The remaining ~57 000 rows still needed processing, making a full restart wasteful.
Solution — Quota-Aware Retry, Dedup Filter, Position-Based Recovery
Each problem was addressed with a targeted fix:
# Fix 1 — distinguish quota 403 from access-denied 403 if resp.status_code == 403: error_type = resp.json()["meta"]["errorType"] if error_type == "rate_limit_exceeded": raise HTTPError("quota") # → pause + retry, NOT permanent skip return "-", "-" # genuine access denied → skip, log reason # Fix 2 — dedup filter after each successful API call existing = {name.lower() for name in with_name + created_by_name} pairs = [(n, i) for n, i in zip(overlap_names, overlap_ids) if n.lower() not in existing] # strip duplicates, keep genuine # Fix 3 — position-based crash recovery via --only-ids-file # Two known anchor pairs from the log (pos → checkin_id) gave OFFSET=4717 # rows_with_cid[pos - 1 + 4717] → exact checkin_id for each FOUND line python enrich_overlaps.py --only-ids-file recovery_ids.txt # resets any incorrectly-marked rows for those IDs, then processes only them
The --only-ids-file flag accepts a plain text file of one checkin_id per line. Before building the work queue it resets any row in that set whose overlaps_id was incorrectly finalised (back to ""), ensuring the IDs are always re-processed regardless of prior run state. Sleep was increased from 0.35 s to 1.5 s per call to stay safely below the quota ceiling for the full 65 000-row run.
Post-enrichment cleanup applied the dedup filter retroactively: 408 existing rows had their overlaps_name / overlaps_id scrubbed of any name already present in with_name or created_by_name, with entries reduced to "-" where nothing genuine remained. The surviving 38 genuine overlaps — people who happened to be at the same place at the same time, entirely independently — were committed to the data repo and rendered in the companions page.
Challenge 11 — Recovering and Hosting 21 000+ Check-in Photos
Foursquare's public API exposes photo metadata per check-in, but rate limits and the sheer volume (~21 000 images for ~65 000 check-ins) made a naive full re-fetch impractical. The private data export archive contained all historical photos, but the export's CSV files referenced internal Foursquare URLs rather than actual image bytes — so photos still had to be fetched individually, and the index had to be built from scratch.
Three compounding problems:
1. Incremental indexing without re-downloading history. The photo index (photos.json) grows over time as new check-ins are added. A naive run would re-probe every un-indexed check-in on every CI run — including the ~51 000 that were already confirmed to have no photos — wasting quota and time.
2. Orphan photos belonging to tips, not check-ins. After cross-referencing all 21 168 downloaded jpg files against the check-in index, 17 files were unaccounted for. Investigation of the export's photo URLs revealed 12 of them used an /item/{id} path rather than /checkin/{id} — matching tip IDs, not check-in IDs. They had been silently mis-classified.
3. Serving 21 000+ files on a static site at zero cost. GitHub Pages has a 1 GB repo limit; embedding binary assets in the repo was not viable. The site needed a public CDN that was genuinely free at this scale.
Solution — Export Cutoff Detection, Tip Photo Discovery, Cloudflare R2
Each problem was addressed with a targeted fix:
# Fix 1 — auto-detect cutoff; only probe check-ins newer than already indexed known = set(photos_by_checkin.keys()) cutoff_ts = max( (int(r["date"]) for r in rows if r["checkin_id"] in known), default=0 ) pending = [r["checkin_id"] for r in rows if r["checkin_id"] not in known and int(r["date"]) > cutoff_ts] # skip old confirmed-empty check-ins # Fix 2 — tip photos stored in tips.json, not photos.json # Export URL pattern: /item/{tip_id}/photo → tip photo, not checkin photo # 12 matched tips.json entries by ID → added "photo": "filename.jpg" field # photo_url computed at build time: pix_url + "/" + tip["photo"] tip["photo_url"] = pix_url.rstrip("/") + "/" + tip["photo"] # Fix 3 — Cloudflare R2 (free tier: 10 GB, 10M reads/month, $0 egress) # Files uploaded at pix/filename.jpg; --pix-url must include the /pix prefix aws s3 sync private-data/pix/ s3://${R2_BUCKET_NAME}/pix/ \ --endpoint-url https://${R2_ACCOUNT_ID}.r2.cloudflarestorage.com python scripts/build.py \ --photos private-data/photos.json \ --pix-url "${R2_PUBLIC_URL}" # e.g. https://pub-xxxx.r2.dev/pix
The photos.html gallery renders 21 000+ images lazily in batches of 300, with a country/city accordion filter (countries collapsed by default, cities as pill buttons), a separate tip photos section with its own lightbox mode, and a hero count that includes both check-in and tip photos with an anchor link to the tip section. The multi-photo badge on index.html recent-check-in cards shows the first photo plus a +N overlay when a check-in has multiple images; clicking navigates through all photos for that check-in in the inline lightbox.
In CI, aws s3 sync uploads only new files (those not yet in the bucket), making each incremental deploy fast regardless of total gallery size. The --pix-url flag keeps the build fully decoupled: local builds use a file:/// URI; the deployed site uses the R2 public URL — no code changes needed between environments.
Zero-Dependency Static Generation

The build system is intentionally minimal. build.py is the single orchestrator: it loads YAML/JSON config, calls transform.py to normalise city and country names, calls metrics.py to compute all aggregations and trip detection, then renders two Jinja-free template files using simple {{PLACEHOLDER}} substitution.

The four generator scripts (gen_companions.py, gen_feed.py, gen_venues.py, gen_worldcities.py) each embed their complete HTML template as a base64-encoded string (_TMPL_B64). This makes each generator fully self-contained — no external template files, no path resolution issues regardless of working directory. To modify a page's design, you base64-decode the string, edit the HTML/CSS/JS, re-encode.

Trip detection in metrics.py runs a single-pass scan over the sorted check-in sequence: any consecutive run of check-ins where city ≠ home_city and the run length exceeds min_checkins (configurable) is declared a trip. Trip names are auto-generated from the most-visited countries and cities in that sequence.

# Data flow — single build command python scripts/build.py \ --input data/checkins.csv \ --config-dir config \ --output-dir . # Internal execution order: # 1. transform.py — apply 3-layer city/country normalisation # 2. metrics.py — compute aggregations, trip detection, centroids # 3. index.html — rendered from templates/index.html.tmpl # 4. trips.html — rendered from templates/trips.html.tmpl # 5. gen_companions.py → companions.html # 6. gen_feed.py → feed.html # 7. gen_venues.py → venues.html # 8. gen_worldcities.py → world_cities.html # 9. gen_tips.py → tips.html # 10. gen_photos.py → photos.html # 11. gen_stats.py → stats.html # 12. gen_search.py → search.html + search-index.json
Technology Choices

Every choice optimises for zero operational overhead and long-term maintainability — no framework churn, no node_modules in the build pipeline, no containers to patch.

Build Pipeline
Python 3.9+
zoneinfo · requests · pyyaml · timezonefinder
Real-Time Poller
Cloudflare Worker
ES module · 1-min cron · KV timestamp store
CI/CD
GitHub Actions
schedule + workflow_dispatch · 2-repo checkout
Hosting & CDN
Cloudflare Pages
Auto-deploy on push · global CDN · free tier
Maps
Leaflet 1.9 + leaflet.heat
Heatmap · dot map · trip maps · country flags
Charts
Chart.js 4.4
Bar · line · doughnut · polar area
Data Source
Foursquare API v2
Incremental fetch · full re-fetch mode
Weather
Open-Meteo Archive
Historical weather per check-in · free · no key
Data Privacy
Private GitHub Repo
Fine-grained PAT · Contents scope only
Photo Storage
Cloudflare R2
21 000+ photos · 10 GB free · $0 egress · S3-compatible sync
What Matters and Why
Correctness over convenience
Every timestamp is processed in the observer's local time using authoritative IANA timezone IDs, not browser-inferred offsets. DST transitions, political timezone changes, and UTC vs. local date edge cases are all handled explicitly.
🔒
Least-privilege by default
The data PAT has Contents read/write on one repo only. The Worker's GitHub token has workflow scope only. No secret has broader permissions than its single job requires.
🪶
No framework lock-in
No React, no Next.js, no bundler, no transpiler. The build is pure Python. The frontend is plain HTML + vanilla JS. In five years, the dashboard will still build with python scripts/build.py and serve from any static host.
🔁
Idempotent pipeline
The KV timestamp is written before the GitHub dispatch, not after. If the dispatch fails and the Worker retries, it won't trigger a double build. Every stage is designed to be safely re-runnable.
🧠
LLM-assisted development
The entire system was architected and built using Claude Code as an active pair programmer — from data pipeline design to CSS rendering edge cases. Architecture decisions, debug sessions, and normalisation logic were all developed in iterative dialogue with the model, with the human providing domain knowledge (geography, timezone rules, data quality patterns) and the LLM translating them into correct, maintainable code.
📐
Sync invariants are documented
Where two files must stay in sync (e.g., the continent-aware city matching logic shared between index.html.tmpl and gen_worldcities.py), the constraint is explicitly documented in CLAUDE.md — so future AI-assisted edits know to update both files together.