New Technology Concepts – Using a Graph Database for a Data Warehouse
It’s often the case that I find myself spending too much time helping clients on projects to measure and analyze their businesses that I end up neglecting some activities where the benefit is not immediately measurable. Attending industry seminars is one of those activities so I’m glad that I forced myself to take the time to attend TDWI’s Minneapolis meeting in December.
One of the presentations covered a new concept that may not be familiar to many people – even many people who work extensively on reporting and analytics projects: Graph Databases. Dan Bennett, VP of Enterprise Data Services and Thomson Reuters, covered TR’s use of Graph Databases to efficiently store data in a way that makes it easily used by a number of applications in their organization including reporting/analytics and for providing metadata in Reuters articles.
Unlike relational databases, graph structures incorporate descriptive elements to store data that are used to help define the relationship between one data element and other data elements. One way to visualize this is to think of a person and her/his relationships with other people. We all know many people and that nature of our relationship to those other people varies dramatically. We may be in the same family, may work with those people or buy our morning coffee from that person.
Graph databases are comprised of nodes, properties and edges. To continue with our example, a person is a Node (an object listed in the database). Property describes the Node (female, mountain climber). And an Edge describes a connection between two Nodes.
Some structure for what and how information is tracked needs to be defined for Graph Databases. This organizational structure of the data is referred to as ‘Ontology‘ and represents a philosophical application of information organization.
The person example is useful in that it may explain why Facebook and Twitter use this kind structure to store the vast amounts of data generated by the people using their applications, especially considering that relationships between people are a big thing that they are tracking.
It was clear from Bennett’s presentation that Thomson Reuters has a very mature implementation of this strategy including an ontology that has been thoroughly developed. This mature ontological approach has led to other tools such as assigning something called a PermID (via a tool called OpenCalais).
Just as it was obvious to me that TR’s usage of the Graph Database approach is mature, it is likely that my lack of familiarity is just as obvious. I’ll post more as I learn more.