Back to News
Advertisement

Scraping 241 UK council planning portals – 2.6M decisions so far

mmebkorea about 3 hours ago 60 comments

RU version is available. Content is displayed in original English for accuracy.

I've been scraping 241 UK council planning portals – 2.6M decisions so far

UK planning data is technically public. In practice it's locked behind 400+ different council portals, some still running bespoke ASP.NET that looks like it dates from 2004, some behind AWS WAF, all with subtly different schemas. I've spent four months scraping them. I'm now at 241 councils and 2.6 million decisions across England, Scotland and Wales.

The scraping problem

Most UK councils run one of a handful of portal systems, Idox being the most common. In theory this makes things easy. In practice every council has configured theirs differently, some block non-browser requests via TLS fingerprinting, some have rate limits that will get you banned inside 10 minutes, and a handful are running the aforementioned bespoke ASP.NET.

I ended up writing several scrapers: a standard requests-based one, a Playwright-based one for councils that block anything that doesn't look like a real browser, and a curl_cffi one for TLS fingerprinting. Some councils I still can't get. Liverpool's portal sits behind AWS WAF with a JavaScript challenge. I have a working Playwright-based scraper that solves the challenge once and reuses cookies, but the WAF rate-limits the IP after about 10 requests and then blocks me for a day. So I have 60k Liverpool decisions from an old scrape and no easy way to add more.

What I found

The approval rate stuff is what most people come for. Nationally it's around 88%, but it varies wildly by ward within a council, not just between councils.

The more interesting finding came from the time-to-decision data. Across 119 English and Welsh councils, 36.5% of home extension applications missed the statutory 8-week target in 2025, up from 27.9% in 2019. Guildford is the worst at scale: 66% of decisions over target, averaging 13.3 weeks.

What it is now

A postcode checker (free) and paid PDF reports (£19/£79). Zero paying customers so far, which is fine. I've been heads down on data quality and coverage.

Site is planninglens.co.uk if you want to poke around. AMA on the scraping side – that's where the interesting problems are.

Advertisement

⚡ Community Insights

Discussion Sentiment

72% Positive

Analyzed from 3251 words in the discussion.

Trending Topics

#data#public#council#planning#more#councils#right#scraping#local#gov

Discussion (60 Comments)Read Original on HackerNews

CJeffersonabout 3 hours ago
So, this sounds exciting to me, but the postcode checker really feels like a spam as a user. All it tells me is 'Mixed results'. I could make a website that prints 'mixed results', I bet most results are 'mixed'!

I understand wanting to get money, but honestly, there is no way I would give money to this website in it's current state, you are giving me far too little info before asking me to hand over a credit card.

Then, if someone gives you £19, a crazy amount of money honestly, the last page of the report is an advert to give them 4 times more!

mebkoreaabout 3 hours ago
Really useful feedback, cheers. Yeah, "Mixed results" is kinda rubbish as you say. It should give you something concrete before asking for anything. I'll fix that today. Fair point on the £79 upsell at the end of a £19 report too. That's tone deaf and I'll move it. On the £19... I'll think about it, but you're right the site needs to do more to justify the spend before pulling out a card. Appreciate the honest take!
CJeffersonabout 3 hours ago
Just a quick follow up, if my reply seemed very harsh, view that as a sign of how enthusiastic I was to see the website at first. I understand wanting to make money, but I'd seriously consider giving a lot more away (maybe even the basic report stuff) away for free, I'd love to explore my local area, my parent's, be nosey what life is like in Oxford (a place I previously lived), but even if I was willing to pay (I'm not), having to stop, get PDF, download, really breaks the flow.
mebkoreaabout 3 hours ago
No, that's absolutely a fair follow-up and not harsh at all. It's very helpful. The "be nosey about places you used to live" use case is exactly what the postcode tool should serve (thinking about it), and right now it doesn't. You're right that PDF-downloads break flow badly. Tbh... that's a hangover from the "people want a thing they can save" assumption that I'm still stuck in, I guess. I'm still on the fence about giving the paid reports away wholesale, but the gap between "tells you nothing" and "£19 PDF" is way too big. I'm gonna need a middle layer of free but actually useful exploration on the site. Will have a solid think about this today. Appreciate the feedback!
pjc50about 2 hours ago
What benefit would people gain from the reports? Average rate of success/time is interesting, but I'm not sure what you'd do with this information other than a bit of local press discourse. I suppose it's nicely timed for the council elections?
mebkoreaabout 2 hours ago
Honest answer... I don't fully know, zero paying customers so it's still very much a hypothesis. The two use cases I think hold up: (1) people pre-buying a house with extension potential, who otherwise guess or pay £500+ for a planning consultant; (2) homeowners about to commission £2-5k of architect drawings who want a sanity check before proceeding. Someone else suggested £100-500 for a proper pre-submission review which is probably better for that second case than my £19 report. The "general state-of-area" framing is the weakest one and you're right it's mostly local press discourse — that's marketing not revenue.
lifeisstillgoodabout 2 hours ago
Some thoughts

1. Brilliant! Governments (and corps) treat public data like it’s theirs not ours. Information yearns to be free.

2. Having said that, you are likely violating T&Cs by scraping at all.

3. It is a lot easier to defend your position if you are making it free and public yourself.

4. But paying for food is nice

5. I suggest the business model here is providing architects and lawyers with strong evidence of prior planning decisions nationally

Most people applying for (difficult) planning have experience locally. But the planning system is a mess because it is not coherent nationally or regionally. The win here is not providing a copy of your data (that has legal issues) but providing pointers to decisions that support the case of the person paying you.

So I want to turn an old pub into tasteful housing and a cafe for the local village. The local planning team don’t like it, I could spend money bribing them and the councillors (see how much I understand British democracy) or I could get from you the fifteen pub to housing conversion decisions from around the country and use that to help my bribed councillors defend their u-turn

Everyone wins :-)

mebkoreaabout 1 hour ago
Cheers, appreciate the feedback. The architect/consultant precedent angle is interesting and a couple of other commenters have already nudged me in similar directions. Tbh... you're likely right that the strongest commercial play isn't B2C £19 reports, it's giving someone fighting a contested case the national pattern across 15 similar pub conversions, the appeal outcomes, what stuck and what didn't. That's a very different product to what I have now but the data supports it. On the T&Cs/legal stuff... I'm not going to pretend I have perfect clarity on it. The position I'd defend is that the data is statutorily public, councils are required by law to publish it, I respect rate limits, and I'm aggregating not republishing in bulk. But there is this grey area between data being public to view and being usable for a commercial product, and I haven't fully nailed it down.
lifeisstillgoodabout 1 hour ago
I agree on the “public” data issue - I spent a long time campaigning for better FOSS / data access in government and there are some great people pushing in and outside local and central gov.

But it’s a big mindset chnage (one that will benefit the whole country) but it’s slow.

I think the “push for public policy improvements” angle if genuine will get you a lot more respect and kudos when things get sticky. Good luck

mebkorea28 minutes ago
Cheers, and things get sticky isn't lost on me, especially after some of the feedback. Seems all the feedback is converging on the same direction namely open-source the data layer, lead with public-interest, let any commercial product sit on top rather than be the thing. Haven't fully thought it through but maybe this is where it should go. Appreciate the gov-data campaigning context too.
simonjukabout 2 hours ago
I work with public data, and I'd love to get access to this data, but I suspect that although you have scraped the data from public websites, there are licensing and copyright implications for actually using it.

See also the open addresses project by Data Adaptive [1] which is using Freedom of Information requests to publish public council tax address data. The problem they have run into there is that their address datasets are derived from proprietary Ordnance Survey data.

It looks like data.gov.uk is in the process of standardising the planning application process, and publishing them under OGL [2].

[1]: https://www.owenboswarva.com/blog/post-addr44.htm [2]: https://www.planning.data.gov.uk/dataset/planning-applicatio...

mebkoreaabout 2 hours ago
Thanks and yeah, some of my boundary data (for the choropleth) comes from ONS open boundary files which I think are OGL but I'd need to check the chain of derivation. On the data.gov.uk standardisation, I've seen it but last I looked it was policy and boundaries, not actual decisions. Has that changed? If they're publishing decisions under OGL I'd gladly ditch the scraping for a proper feed. On licensing more generally... I haven't fully nailed it down. Showing aggregates and pointing back to source, but yeah there's a gap between data is public and do whatever you want with it commercially.
jayelbeabout 1 hour ago
As somebody who works in local government IT, consistent scraping of our data like this is the bane of our life. We get hit by thousands of these, many with no rate limiting, making hugely intensive requests, that cause downtime and knock-on effects for actual customers and citizens. We block IPs, add captchas, and yet it persists.

If you really want the data, just FOI it for goodness' sake.

I get the distinct impression that many of these outfits aren't really advocating for impoved transparency but are simply trying to exploit and monetise illicitly obtained government data to make a quick buck.

mebkoreaabout 1 hour ago
Fair points and yeah you must be sick of unrate-limited mass scraping. I run with 1.5-3 second delays from a single residential IP and back off when portals push back, but from your side I look the same as someone hammering you. On your point regarding FOI, what you say is fair. I should probably have led with that for the trickier councils. But the honest reason I haven't is doing 240 FOI requests at scale felt like it'd put a different kind of strain on councils, but if you're telling me the scraping is worse then I take that seriously. On "monetise illicitly obtained data"... I'm not going to pretend the £19 is altruism. But there is a public interest in this data being navigable across council boundaries, and that's not something individual councils can deliver. I must stress that I'm not sure I've got the model right yet and a lot of feedback today is pushing me toward more free, which I'm seriously considering.
sublinear44 minutes ago
Maybe I'm just naive, but why wouldn't a citizen do both?

I'm not implying that anything would get deliberately redacted, but it seems likely that information released through other channels would not match the web. A request might also reveal information that was not on the web.

What other choices are there?

efarefabout 3 hours ago
Great site. This data should really be more accessible. Planning in the UK is a total crapshoot, subject to the whims of the planning authorities. In our case, a simple rear extension and dormer loft conversion, similar to hundreds of thousands across the country, we ended up having to appeal which added 2 years and tens of thousands of pounds in costs to our extension project. Our area shows up as a high refusal area, which tracks.

It would be good to add appeal data in (also a public gateway) to show which councils are just being unreasonable.

I personally think the planning regulations in this country are the cause of many ills, including the housing shortage. It just costs so much to get through planning these days, it is often just not worth it. Data like this could help us get that changed.

ricardobayesabout 3 hours ago
Maybe a tongue-in-cheek comment but regulations are that way because you guys want it that way (maybe not you personally). If it wasn't like that, nothing would stop a garbage incinerator or a quarry popping up a few hundred meters from houses (which happens in European countries with more deregulated planning/zoning regulations).

You guys have all kinds of pro-individualistic, borderline nonsensical residental housing laws like "right to light" and "right to view". It's completely incompatible with "build more". Most British people view their privacy (or perceived privacy) as a higher priority than fixing the housing market. "It's so overlooked" is such common comment and it's almost bizarre to someone used to living in a higher density environment (like the UK very much is).

jayelbeabout 2 hours ago
Waste disposal and planning for quarrying and mineral extraction are different functions, decided at a higher tier of local government, and are not directly comparable to development management/planning.
safehussabout 2 hours ago
This is awesome! Worked on something similar albeit a different industry.

For the more challenging scrapes, would highly recommend using the Chrome Devtools MCP to be able to attach the network requests, being made by the browser to the site, as context for your agent/LLM chat - this approach really helped me to write a solid API-based scraper (also using curl_cffi) and bypassed the old tedious playwright-based approach I used to rely on.

mebkoreaabout 1 hour ago
Nice thinking. Hadn't thought of DevTools MCP that way. Curl_cffi I've used for TLS fingerprinting (Edinburgh was the first one) but the discovery side I've been doing manually... open DevTools, look at the request, copy as cURL, work out which params can be pruned. Automating that loop with an LLM in the middle would speed things up a lot, especially for the bespoke long tail. Will look into that this week. Thanks!
edentabout 2 hours ago
Have you tried using FoI to get the data? I've had some success with data requests - often getting dumps in CSV or similar.

I appreciate that won't necessarily capture live / recent data. But it might be quicker than waiting for rate-limits to reset.

pbhjpbhjabout 2 hours ago
Have you spoken to any planners, a quick search for similar applications in other LAs might be a useful thing for them.

There's a Royal Institute of Town Planners, they probably have a magazine you could advertise in (but equally that might get you blocked, idk).

RICS people could probably use the data too? I guess it's useful house-buyer info; houses in the vicinity had successful loft conversions, say.

On the data side - it's something of a moat for you now, but I could see you being successful with FOI requests. An MP might be interested in championing open data access.

mebkoreaabout 1 hour ago
All good points. I've been so busy with the data collection and just "irl stuff" that I haven't spoken to planners directly which is an oversight on my part — they're the obvious power users. RTPI/RICS are both on my list but as I said, I've been focusing on data more than distribution. Probably the wrong order tbh. FOI is interesting, especially for the trickier portals (Liverpool's WAF, the dead-portal ones). It might be cleaner than scraping. MP/open-data angle is definitely something I hadn't seriously considered. Worth thinking about tho! Thanks.
notarobot123about 2 hours ago
It looks like this kind of data will start to be more open in the future. New legislation introduces mandatory data standards in England: https://mhclgdigital.blog.gov.uk/2026/04/22/data-standards-l...
doublesocketabout 2 hours ago
It's the most ridiculous situation with council technology that they all use different providers for what are fundamentally the same functions. It's the same for council tax and a host of other services as it is for planning. Consequently, at least from the various portals I've used, they all do it badly. This absolutely could and should be done by a single, well funded central team.
mnkyokyfrndabout 2 hours ago
Unless you use a nationalised product for this; this is the best outcome.
doublesocketabout 2 hours ago
GDS was a national level effort and they certainly did a better, albeit not perfect, job than the myriad of private solutions councils use. There just doesn't appear to be the capability to properly specify and source IT at a council level.
morkeeabout 2 hours ago
I hate to be a downer but...

> UK planning data is technically public.

it's public, but still copyrighted by those who submitted it

the councils also have database rights over their database, unless you've obtained explicit permission from them directly

https://en.wikipedia.org/wiki/Database_right#United_Kingdom

> I ended up writing several scrapers: a standard requests-based one, a Playwright-based one for councils that block anything that doesn't look like a real browser, and a curl_cffi one for TLS fingerprinting.

so they're explicitly trying to stop you doing this, and ... you're openly admitting to bypassing their technical measures to try and stop you?

have you heard of the Computer Misuse Act?

I doubt the 240 councils are going to be happy once they find out you've done this, especially if you're selling it on for profit

mebkoreaabout 2 hours ago
Fair points and I appreciate the feedback. Database right is real but the threshold is "substantial part". I'm literally only showing aggregates and letting people search by postcode. I'm not completely republishing council databases. Think that's defensible, but not gonna pretend that it's 100% black and white. On CMA, I'd push back. That's about unauthorised access. These portals are public-facing and the data's published deliberately for people to view. Rotating user-agents isn't bypassing security in any meaningful way... I'm not breaking auth or guessing passwords. I back off when portals signal they're unhappy (Liverpool's WAF actively rate-limited me which is why that data's stale). No council has reached out so far. Could change ofc. Solo founder with no legal team though, so happy to be told I've got it wrong.
Advertisement
codeulikeabout 2 hours ago
I'd be careful because even though its 'public' data, scraping it might not be legal due to TOS of the various sites.

I did a search for my postcode and got given results for a different area and council miles away

mebkoreaabout 2 hours ago
Thanks for the feedback. On TOS: the same answer as I gave others... the data is statutorily public, I respect rate limits. That being said, I admit it's a grey area I haven't 100% nailed down. The postcode bug is more concerning. That shouldn't happen. Do you mind sharing which postcode or city/county? It could be that it's falling back to the wrong council because I don't have data for the right one, or it's a bug in my mapping. Either way, it needs fixing asap! Cheers for flagging.
codeulikeabout 1 hour ago
OK I have emailed your hello@planninglens... address with screenshots
mebkoreaabout 1 hour ago
Thanks. Will look into that right away!
niffydroidabout 2 hours ago
Ace, I can see how this could actually be quite useful for house conveyancing. You've put a lot of effort into this. How are you affected by the upcoming changes to local government? They'll no doubt be some rationalisation at some point.
pbhjpbhjabout 2 hours ago
Is any of the data on Gov.uk - any scrapping tips there? I've tried scraping some patent tribunal data but haven't been successful (just using Python (copying in session data), I guess Playwright might be useful there).
mebkoreaabout 2 hours ago
Planning data on gov.uk is really patchy and not useful for what I want. There's planning.data.gov.uk which has some boundary/policy data but no actual decisions. The decisions only exist on council portals, which is the whole reason this project exists. On patent tribunal, I haven't looked into that one specifically but a few general gov.uk tips: most gov.uk content is actually clean HTML (way easier than council portals), so if requests isn't working it's usually either JS-rendered content (Playwright fixes this) or session/cookie weirdness. Things that have helped me elsewhere: Playwright with page.wait_for_selector rather than networkidle, copying real browser headers wholesale (not just User-Agent), and checking if there's a hidden JSON API behind the page (open devtools → Network tab → look for XHR/fetch requests when you click search). Often there's a clean JSON endpoint that the page is using, which is way easier to scrape than the rendered HTML.
ashish-alexabout 3 hours ago
Working on similar problem in another domain. I found agentic direction powerful with browser use plugged into a multimodal (strong agentic capability) llm like gpt 5.4 mini working in a loop with orchestrator evaluator/judge.
mebkoreaabout 3 hours ago
Nice! Yeah, I went the other way... deterministic scrapers per portal type because once you've worked out the search form quirks for an Idox or Northgate or Ocellaweb, it's the same shape across every council using that platform. So the marginal cost of adding council N is config not code. The agentic approach gets more interesting for the long tail though — the bespoke ASP.NET ones where every council is its own snowflake... and it is a GRIND honestly. How are you finding the loop on cost vs reliability?
gnfargblabout 3 hours ago
Deterministic scrapers are almost certainly the right answer for this task, because once those special snowflakes have paid for their bespoke IT system, they'll never change it.

On the grind, why not get an agent to help you build the long tail of deterministic scrapers? Claude etc is really shockingly good at this kind of moderate-complexity iterative work, it will just keep going around the fetch/parse/understand loop until it has what you're looking for.

mebkoreaabout 2 hours ago
Yeah, that's essentially what I'm doing. Claude handles most of the look at the portal, work out the search form, write the config loop. The actual bottleneck isn't code tbh, it's that every (snowflake) council needs like 30+ minutes of investigation before you can even get going, and a chunk deadend because the portal's broken or migrated. I already hit three this morning. Worcester returns connection refused, Breckland's URL is dead, Rother migrated to a different platform. The grind is "is this portal even alive" more than the scraper itself.
vr46about 2 hours ago
Amazing! It’s so bloody hard to access this information or even to know what there is.

Careful not to expose the councils too publicly before they shut you off

mebkoreaabout 2 hours ago
Cheers! Yeah, it's honestly mental how fragmented it is. Every council is its own little island. On the shutting-off worry: the data is statutorily public. Councils are legally required to publish it, and I'm respecting rate limits and not hammering anyone. So far no council has objected. Touch wood this remains the case. Tbh, I think the risk is more from the platform vendors than the councils themselves. It seems Idox etc have a commercial interest in this data being awkward to access.
nopurposeabout 2 hours ago
There was a story how similar initiative for a courts decisions scraping was shut off.
sublimefireabout 3 hours ago
Send a message to infoshareplus.com They might be interested in your data because they operate a business around local govs.
mebkoreaabout 3 hours ago
Thanks, hadn't come across them. I will have a poke around and reach out. Appreciate the pointer.
ferngodfatherabout 3 hours ago
Your terms:

> You may not use automated tools to scrape, copy, or bulk-download data from our service.

Pot kettle, huh.

mebkoreaabout 2 hours ago
Fair catch and pretty embarrassing... ngl. That's a generic template clause I didn't think hard enough about at the time and it's obviously contradictory given what the site does. I'll rewrite it today. The position I want to take is: scrape responsibly, respect rate limits, don't republish bulk data, which is what I try to do with the councils. Will fix the wording. Thanks.
mebkoreaabout 2 hours ago
Updated and pushed live: planninglens.co.uk/terms. Acceptable Use clause now permits programmatic access that respects rate limits, while still protecting our derived analysis and reports. Thanks for the kick.
dabeeeensterabout 3 hours ago
Have you tried using Browserless/similar to scrape around tricky hosts?
mebkoreaabout 3 hours ago
No, I haven't tried Browserless. So far, it has all been from a single residential IP which is probably the bigger issue with Liverpool than the WAF challenge itself. Once I have a valid session cookie I can solve the JS challenge fine, the rate limit is per-IP. Rotating residential proxies (or Browserless behind one) might be the answer... I'm just reluctant at this stage to bite the bullet on the cost for a single (albeit huge) council. Have you used it for similar stuff?
imdsmabout 3 hours ago
How long did the scraping take you to build?
mebkoreaabout 2 hours ago
Around four months part-time. The bulk was the first 6 to 8 weeks building the three main scrapers (Idox, Northgate, Ocellaweb). After that, councils on those platforms are mostly config. The rest has been a long tail of bespoke portals, each taking anywhere from an evening to "give up and revisit and repeat".
beatthatflightabout 3 hours ago
Worth trying claude/gemini to see if they'll do some scraping for you. I've found some paywall sites only too happy to allow Gemini past the wall.
mebkoreaabout 3 hours ago
Hadn't thought of that tbh. Worth a go on Liverpool especially... that's the AWS WAF one I'm currently blocked on and it is doing my head in. The challenge there is volume rather than access (~80k decisions to backfill), so even if an LLM gets through the wall I'd still need to script around it. But could be a way in for the initial cookie. Cheers for the tip and will look into it.
Advertisement
jaggsabout 1 hour ago
I would try and go open source as fast as possible before a legal letter land on your desk. Then worry about the commercialisation. Also I have a feeling you could charge SERIOUS coin for some app for property developers based around this. But someone is almost certainly going to come at you because, you know, us Brits hate clever clogs.
mebkorea32 minutes ago
Yeah, unfortunately you are not wrong about the national tradition of wanting to pull down clever-clogs...

The open source angle is something I'm increasingly considering, especially after a local government IT person made a fair point on this thread about the strain it causes. It won't fix the scraping load directly, but might frame the project as public-interest rather than attempting to make a bit of extra money. Tbh, the real value is probably in serving property developers and consultants, not emailing £19 PDFs to homeowners. Got a lot to think about.

jaggs17 minutes ago
I think you're absolutely on the right track. It's all a matter of optics. Open source gives you the higher ground for the jobsworths. Meanwhile you can put together some kind of cool package to approach property developers with.