With over 12 million entities and 350 million relationships, Freebase is an excellent resource for performing text analysis. One way to look at document "understanding" is to think about how the entities in the document are connected on a knowledge graph. This is similar to the "reconciliation" process that is used to grow Freebase itself.
The web is currently full of semantic hints, whether they are explicit (like those promoted by the Semantic Web) or implicit (like the use of blog widgets.) Using these hints, text analytic methods can get a toe-hold on the web corpus at large.
45. (Machine) Learning Semantics
get 5M type
types
assertions
2.8M Wikipedia topics
intersect the two calculate feature join feature counts generate type
sources counts per type with topics scores for topics
2.4M features
1.6G scores
1400 types
extract
features
37M features
5M articles
WEX