There is a lot said in the Internet about how Content Delivery Networks speed-up the delivery of static content to a website. Sure — there is a lot of sense in saying that servers that are in physical proximity to the end point (typically a browser) are capable of delivering content with lesser latency than those that are on another continent. There were some important questions that I did not ask myself when I used Akamai’s CDN (due credit to them for nicely abstracting these details away!). Off late I have been reading about Coral, a more open CDN in an academic sense. There are some questions that need to be answered as far as how CDN resolutions are carried out on the fly:
1) Does my application have to be aware of where a web-request is coming from and respond with the the static files pointing to the correct cdns?
2) How does the Internet know what server is closest to the user?
3) Which part of the Internet knows this?
I will try to answer these in order.
The short answer to the first question is “No”. An application need not figure out where a request is originating from and embed appropriate static urls. In short, this would be an inappropriate solution simply because it would not scale — Scale in terms of the CDN. What if a CDN decided to remove/add new servers in a country? He would have to inform all subscribers/clients about the new/removed servers !!
So how does the Internet automatically resolve this?
The short answer to this Anycast addressing. Anycast addressing is conceptually a method on the Internet by which several endpoints on the Internet advertise the same IP addresses (typically resulting in a conflict when these are withing a LAN. I am sure you have seen IP address conflicts). The point to Anycast addressing when it comes to CDNs is that all servers that are geographically spread out can publicize the same IP address. What this does is that when a DNS Resolver (insert link here) tries to resolve a1.akamai.com/image it will always receive the same IP address from DNS servers irrespective of whether the resolver (conveniently viewed as the client) is located in Australia/Asia. However when the client tries sending out a request to the resolved IP address, different routers in different parts of the world understand this address differently. How is this done?
Enter Border Gateway Protocol : A simple mechanism using which routers can speak to each other and advertise their ability to reach particular addresses. So when 2 routers advertise to a Router R1 that they can both reach “18.104.22.168” with weights x and y respectively, it is R1’s prerogative to chose to send messages intended to 22.214.171.124 via either of thses based on which advertised a cheaper “reach” mechanism. This essentially is part of the shortest path finding mechanism that the Internet has incorporated.
To answer the third question ,Servers setup by a CDN advertise the same IP. DNS resolvers even resolve these to the same addresses at all points on the Internet. However, the routers are the gatekeepers that chose to route to the correct edge-of-the-Internet thanks to shortest path finding mechanisms that are hacked around. The router simple thinks that it is sending a request out through via the shortest path that it knows to the IP. The fact that these are different servers that are potentially in different geographic locations is not known to the routers. Ah… how we hack!
There are parts of this article that I might have over simplified — like DNS caching and DNS Resolvers. However, the best way to understand how the Internet “works” is to view these as orthogonal concepts that are built keeping each system’s autonomy in mind. Hence over-simplifying is the right way to go. Forming a holistic view will happen with time.