Address matching is a crucial task in various location-based businesses like take-out services and express delivery, which aims at identifying addresses referring to the same location in address databases. It is a challenging one due to various possible ways to express the address of a location, especially in Chinese. Traditional address matching approaches relying on string similarities and learning matching rules to identify addresses referring to the same location, could hardly solve the cases with redundant, incomplete or unusual expression of addresses. In this paper, to learn the geographical semantic representations for address strings, we novelly propose to get rich contexts for addresses from the Web through Web search engines, which could strongly enrich the semantic meaning of addresses that could be learned. Apart from that, we propose a two-stage geographical address representation learning model for address matching. In the first stage, we propose to use an encode-decoder architecture to learn the semantic vector representation for each address string where an up-sampling and sub-sampling strategy is applied to solve the problem of address redundancy and incompleteness. The attention mechanism is also applied to the model to highlight important features of addresses in their semantic representations. And in the second stage, we construct a single large graph from the corpus, which contains address elements and addresses as nodes, and the edges between nodes are built by word co-occurrence information to learn embedding representations for all the nodes on the graph. Our empirical study conducted on two real-world address datasets demonstrates that our approach greatly improves both precision (up to 8%) and recall (up to 12%) of the state-of-the-art existing methods.