A self-updating personal geolocation intelligence dashboard built on a fully static architecture — no server, no database, no runtime. From a Foursquare check-in to a live deployed dashboard in under 5 minutes.
A personal analytics dashboard that ingests every Foursquare/Swarm check-in and renders it into ten pages: a main analytics dashboard, a trip journal with per-trip maps, a companions tracker, a full check-in feed with historical weather, a tips explorer with country/city tabs, closed/deleted-venue badges, view counts, and filter buttons, a venue loyalty explorer, a world cities map, a full-text search page, a photo gallery with 21 000+ images, a stats overview, and this engineering write-up — all committed to git and served via Cloudflare Pages CDN.
Beyond rendering, the system maintains a data integrity pipeline: a manual archive
workflow snapshots the full check-in history, diffs it against the previous snapshot to detect renamed
or moved venues, and propagates those changes into the tips dataset — all without extra API calls.
Duplicate rows and check-ins that the API silently stops returning are detected on every full re-fetch
and accumulated in an incremental anomaly log (checkins_anomalies.json), giving a
permanent auditable history of every data quality event.
The entire system runs on free-tier infrastructure: Cloudflare Pages, Cloudflare Workers, Cloudflare KV, and GitHub Actions. There is no backend, no database, no containers. The build pipeline is 100% Python 3.9+, the frontend is vanilla JS with Leaflet and Chart.js.
The system is designed around a push-on-change philosophy: nothing runs unless there is new data. A Cloudflare Worker acts as the real-time sensor, keeping the whole pipeline reactive while consuming negligible resources when idle.
The Worker runs on a 1-minute cron trigger, fetching the most recent check-in from
the Foursquare API and comparing its Unix timestamp against the last-seen value stored in
Cloudflare KV. If the timestamp is newer, the Worker writes the new value to KV
(idempotent — prevents double-triggering on retries) and fires a workflow_dispatch
event to GitHub Actions via the REST API.
GitHub Actions then fetches the updated check-in data from the private data repository,
rebuilds all ten HTML pages, commits and pushes to main. Cloudflare Pages
detects the push and deploys automatically — no build command needed, since the HTML is
already pre-built.
timezonefinder) fails for
countries that don't observe Daylight Saving Time. For example, Belarus geographically
maps to UTC+2 in summer, but politically observes UTC+3 year-round. This causes
check-in local times to be off by one hour for half the year.
metrics.py maintains a _COUNTRY_TZ dictionary mapping
country names to authoritative IANA timezone IDs. This takes precedence over the
coordinate-based lookup. Europe/Minsk is always UTC+3, regardless of what
the geometric timezone boundary says — because that's what the clocks actually show.
checkins.csv contains full location history: GPS coordinates,
venue IDs, timestamps, and companion names across years of travel. Committing this
to a public repo exposes structured personal data to anyone crawling GitHub.
foursquare-data). A fine-grained Personal Access Token scoped
exclusively to that repository allows GitHub Actions to check it out at build time.
The public repository never sees the raw CSV — only the generated HTML output, which
contains only the aggregated, display-ready data already visible on the site.
The PAT has Contents: read/write and nothing else.
index.html.tmpl and gen_worldcities.py implement a
CTRY_CONT JavaScript dictionary mapping every country
to its continent. The matchVisited() function rejects a world-cities
database match unless the candidate city's country falls on the same continent as
the visited check-in's country. This guard is maintained in sync across both files —
a documented invariant noted in CLAUDE.md.
metrics.py runs an 8-pass pipeline over each candidate trip window
to progressively widen its boundaries until they reflect the actual journey:
"airport" in cat.lower()) rather than an exact string, catching all
variants emitted by Foursquare: International Airport, Airport
Terminal, Airport Gate, Airport Service. A
prev_end_idx guard prevents the backward scan from crossing into a
preceding trip's arrival rows. Where the heuristics fall short, three JSON config
files provide surgical overrides: trip_start_overrides
and trip_end_overrides pin exact boundary timestamps,
and trip_names.json / trip_tags.json
attach human-readable names and activity tags (bicycle, camping, etc.) keyed by the
final resolved start timestamp.
/users/self/tips endpoint silently omits any tip written on
a venue that has since been closed or deleted. With no error, no flag, and no indication
in the response, a full fetch of 1 782 tips appeared complete — until a per-venue sweep
revealed 25 additional tips that existed only on closed venues.
A secondary problem: the API returns country names in the local language
(Беларусь, Republica Moldova, المغرب…) rather than
a consistent English form, breaking grouping, flag lookup, and tab rendering.
viewCount field entirely absent from the API response. The export also
revealed that presence in checkins.csv is not a reliable proxy for
venue activity: a venue can appear in historical check-ins and still be closed on Foursquare
today. Determining true closed/deleted status required fetching each venue page individually.
closed=True in tips.json.
The 25 pre-existing sweep tips were identified retroactively by comparing the initial
1 782-tip commit in the data repo against HEAD (1 807 tips); the 25-ID delta was
patched directly.
foursquare.com/v/{id} with browser session cookies
(the public page embeds "closed":true in its __NEXT_DATA__ JSON
or in the raw HTML for closed venues). 95 of 100 venues were confirmed closed on-page;
the remaining 5 loaded on the legacy app.foursquare.com renderer with no closed
marker, indicating they are still active — and their tips were found to have been
deleted by moderators rather than lost to venue closure.
viewCount for all tips — a field the API never
returns. fetch_tips.py was updated to capture viewCount going
forward; historical counts from the export were backfilled into tips.json
in one pass. View counts are refreshed on each full re-fetch (--full);
incremental runs only touch tips newer than the latest known timestamp.
CTRY_NORM dictionary
in gen_tips.py mapping every local-language variant to its English form.
City names reuse the existing city_merge.yaml pipeline. Both normalised
values are stored as nc (country) and nci (city) on each tip
record and propagated into the recent-30 tips slice embedded in index.html.
Tip cards display a red CLOSED badge, a purple
DELETED badge, and a 👁 view count
in the footer. Three dedicated filter buttons — By Date, Closed only,
and Deleted only — let the reader surface each data quality category directly.
checkins.csv three years ago may now point to a venue
with a different name, city, or coordinates — and tips.json, which duplicates
that venue metadata per tip, can drift out of sync independently.
Additionally, full re-fetches occasionally surface a second problem: some check-ins that
existed in the old CSV are simply absent from the API response (deleted or merged
venues), while other rows appear duplicated within the historical data with no indication of
the double-entry.
Both silent drift and silent data loss are impossible to detect without an explicit comparison.
sync_venue_changes.py compares the two snapshots on six fields per
venue_id: venue, city, country,
lat, lng, category. For each changed venue it
patches every matching tip in tips.json in-place — converting lat/lng to
float rounded to 5 dp to match the tips schema — and logs a numbered summary of every
updated tip. The patched tips.json is committed alongside the fresh CSV in
the same atomic commit, keeping both files permanently in sync without any additional
API quota.
checkins_anomalies.json:
(venue_id, date) key appears more
than once — identical double-entries from early Swarm usage. They are intentionally
preserved in the CSV rather than silently removed; the anomaly file provides
visibility without data loss. A duplicate_checkins.csv sidecar is also written
for direct inspection.
Missing rows are check-ins present in the existing CSV but absent from the
API response — venues that Foursquare deleted or merged. These too are preserved and recorded
so the count discrepancy is explained and auditable. Both lists accumulate across runs:
new entries are merged in, existing entries are never removed, giving a permanent history
of every data quality event the re-fetch has ever observed.
/v2/checkins/{id} endpoint and is no longer
surfaced through any current API. With ~65 000 historical check-ins, a one-time
enrichment run was the only way to recover this data before it disappeared entirely.
HTTP 403 responses identically — marking the row as a permanent skip
("-") — so quota-exhausted rows were silently discarded alongside genuinely
inaccessible ones, with no way to tell them apart after the fact.
overlaps_name and
with_name / created_by_name. 408 rows were affected.
--only-ids-file flag accepts a plain text file of one
checkin_id per line. Before building the work queue it resets any row in
that set whose overlaps_id was incorrectly finalised (back to ""),
ensuring the IDs are always re-processed regardless of prior run state.
Sleep was increased from 0.35 s to 1.5 s per call to stay safely below
the quota ceiling for the full 65 000-row run.
overlaps_name / overlaps_id scrubbed of any
name already present in with_name or created_by_name, with
entries reduced to "-" where nothing genuine remained. The surviving
38 genuine overlaps — people who happened to be at the same place at the
same time, entirely independently — were committed to the data repo and rendered in the
companions page.
photos.json) grows over time as new check-ins are added.
A naive run would re-probe every un-indexed check-in on every CI run — including the
~51 000 that were already confirmed to have no photos — wasting quota and time.
/item/{id} path rather than /checkin/{id}
— matching tip IDs, not check-in IDs. They had been silently mis-classified.
photos.html gallery renders 21 000+ images lazily in batches of 300,
with a country/city accordion filter (countries collapsed by default, cities as pill
buttons), a separate tip photos section with its own lightbox mode, and a hero count
that includes both check-in and tip photos with an anchor link to the tip section.
The multi-photo badge on index.html recent-check-in cards shows the first
photo plus a +N overlay when a check-in has multiple images; clicking
navigates through all photos for that check-in in the inline lightbox.
aws s3 sync uploads only new files (those not yet
in the bucket), making each incremental deploy fast regardless of total gallery size.
The --pix-url flag keeps the build fully decoupled: local builds use a
file:/// URI; the deployed site uses the R2 public URL — no code changes
needed between environments.
The build system is intentionally minimal. build.py is the single orchestrator:
it loads YAML/JSON config, calls transform.py to normalise city and country names,
calls metrics.py to compute all aggregations and trip detection, then renders
two Jinja-free template files using simple {{PLACEHOLDER}} substitution.
The four generator scripts (gen_companions.py, gen_feed.py,
gen_venues.py, gen_worldcities.py) each embed their complete
HTML template as a base64-encoded string (_TMPL_B64).
This makes each generator fully self-contained — no external template files, no path
resolution issues regardless of working directory. To modify a page's design, you
base64-decode the string, edit the HTML/CSS/JS, re-encode.
Trip detection in metrics.py runs a single-pass scan over the sorted
check-in sequence: any consecutive run of check-ins where city ≠ home_city
and the run length exceeds min_checkins (configurable) is declared a trip.
Trip names are auto-generated from the most-visited countries and cities in that sequence.
Every choice optimises for zero operational overhead and long-term maintainability — no framework churn, no node_modules in the build pipeline, no containers to patch.
python scripts/build.py and serve from any static host.
index.html.tmpl and gen_worldcities.py),
the constraint is explicitly documented in CLAUDE.md — so future
AI-assisted edits know to update both files together.