OpenStreetMap vs Reality

After performing a comparison of OpenStreetMap to Google Maps, I was interested in performing a similar exercise on an already-mapped area.

After moving to Edinburgh at the start of 2008, it was difficult to find an area of the city that was obviously in need of mapping. Most of the centre looked complete, or very nearly so, and there were no large blank areas. Since this meant less opportunity to map from scratch, I decided this would be a useful location to review the quality of typical OSM data.

Edinburgh

Edinburgh is the capital city of Scotland. It has a population of 450,000 and covers approximately 260 km² (most of the population live within the central 100 km²).

Since Edinburgh is a much larger city than Haywards Heath, I selected an area in the centre that was approximately the same size (10 km²) and that included several types of land use (parks, commercial areas, modern housing estates, and the 200 year old "New Town").

Edinburgh City Centre

Edinburgh City Centre
Map © OpenStreetMap and contributors,
CC-BY-SA
 

Target Area

Target Area
Map © OpenStreetMap and contributors,
CC-BY-SA

These images date from February 2008, and show a well-mapped city centre. The road network looks fairly complete, and it is not immediately obvious which areas (if any) require further mapping. The target area had contributions from around 15 individuals.

Surveying took approximately 3 months, with about 1 trip per week. This produced 3Mb of GPS data, containing 15,000 samples. Around 530 photos were taken to record street signs or road layout. Less data was collected than in Haywards Heath (9Mb of GPS data and 700 photos), since many streets had already been mapped and required no corrections.

The target area contained around 710 distinct roads, represented by about 730 ways in OSM. Haywards Heath contained around 500 roads, using 900 ways, so both road networks are of a similar complexity (Haywards Heath has more ways due to several housing estates with heavily branching streets).

You can see the current map here, which is considerably improved over 2008.

OpenStreetMap

Unlike commercial data, OSM is updated daily. In this case the target area appeared to be fairly stable, and was not heavily edited by other mappers during the review.

After re-surveying the target area, 193 "errors" were found in OpenStreetMap:


View larger map or raw KML
Incorrect Junction (1)
Does Not Exist (0)
Missing Road (96)
Missing Name (45)
Missing Flow (8)
Wrong Name (34)
Wrong Flow (4)
Wrong Type (1)
Wrong Shape (4)

All of these errors have now been corrected (as of June 2008).

Incorrect Junction

Only 1 incorrect junction was found (vs 10 in Google Maps). This occurred where two roads were connected together, despite being on different levels (one road is approximately 10 metres higher, and they are connected by concealed steps). This junction was mapped by Yahoo! aerial imagery, where the difference in height is not obvious. This can be found here.

Does Not Exist

All roads found in OSM did actually exist (vs 14 non-existent roads in Google Maps). These are most likely to be deliberate inaccuracies in commercial data, although it should be noted that OSM also suffers from the same problem.

Missing Road

These roads exist, but were not present in OSM. These account for 50% of the errors, and are distributed fairly evenly, suggesting that this is a common problem. You can see several examples here.

Missing Name

These roads had been mapped, but were missing their name. These account for 25% of the errors, and are clustered in an area that was mapped from Yahoo! aerial imagery (highlighting one of the problems with this technique). You can see several examples here.

Missing Flow

These roads had been mapped, but were missing their one-way state. You can see an example here.

Wrong Name

These roads had an incorrect name. These account for 15% of the errors, and are clustered in one area which suggests they may have been applied at the same time. Interestingly the commercial equivalent also contains several errors at this point, although these are different errors.

For example, "Thistle Street South West Lane" can be found here. Originally, this street was incorrectly named as "Thistle Street Lane South West". Google Maps are also in error, and call it "South West Thistle Street Lane". The street sign, showing the actual name, is here.

Wrong Flow

These roads had been marked as one-way when they were open in both directions, or were marked as one-way but with the wrong direction. You can see an example here.

Wrong Type

Only 1 instance was found (vs 6 in Google Maps) where roads had the wrong type. In this case, a section of road had been incorrectly tagged as a footpath. This was found here.

Wrong Shape

Only 4 instances were found (vs 16 in Google Maps) of roads with the wrong shape. Given that most OSM data is collected with consumer-level GPS units, the basic shape of the road network is very good. You can see an example here.

Conclusions

Effort

I used the same equipment for both Edinburgh and Haywards Heath - my bike, and an Edge 305 GPS. Approximately the same area (10 km²) was examined in both, and both required about 15 trips. Each trip lasted around 2 hours, and covered about 30km, giving a total effort of 30 hours and 450km.

This equates to about 6 days of full time effort (4 days to survey, 2 days to build the map), which I suspect is close to the amount of effort to capture a similar map commercially. Although a commercial map captures more data (house numbers, turn restrictions, lane information, etc), it is typically done from a van rather than a bike. It is often backed up with satellite imagery, and involves a team of data collection and processing staff.

Errors

Although there were twice as many errors found as in commercial data, I found it quite encouraging that an amateur effort is "only" twice as bad. Both effort and errors are within a small multiple of the commercial world, which is surprising given the difference in equipment and resources.

About 90% of the errors (175 out of 193) were due to a road being missing, having no name, or having the wrong name. Many of these are clustered together (around 50% of all errors occurred in the 1 km² in the bottom-right), and many were mapped from ariel imagery rather than a ground survey.

This style of mapping almost always results in a road network without names, and I suspect the presence of these unnamed streets on the main map probably reduces mapping activity in that area (simply by making the map look more complete than it really is). Giving unnamed roads a different appearance, so that they are obviously incomplete, would help reduce this problem.

No roads were found that did not exist, vs 14 in Google Maps. This suggests that inserting fake roads (either complete streets, or enough small stubs and branches to act as a fingerprint) is fairly common in commercial data. Unfortunately OSM does contain similar artifacts, but it is unclear how widespread this practice is.

Completeness

As OSM coverage expands, a growing problem is that there is no way to measure either "completeness" or accuracy. It would be extremely useful to have a system that would allow tiles to be flagged as empty/mapped/surveyed, and to colour-code each tile accordingly.

This would make it a lot easier to see where further work needs to be done, and allow mappers to mark the status of the areas they have been mapping in.