![]() It isn't clear if Microsoft has used GE as part of any of its own systems (although it has used LIKQ, as noted above). Its liberal licensing also makes it easily refittable into other products or readily repurposed for hosting at scale. ![]() Instead, GE is a piece of distributed data-storage infrastructure that receives new data and provides graph computation as one of its multiple benefits. What Microsoft appears to be aiming for with GE isn't head-on competition with those projects. With GE, data has to be imported into it first. Elasticsearch and Lucene can be used as indexing engines, and Cassandra and HBase can be used as data stores. It's been built to work closely with and leverage the Hadoop ecosystem. In GE's case, the roles for each node in the cluster (servers and, optionally, query-aggregating proxies) need to be configured manually depending on the use case.Īnother distributed graph database worth comparing to GE is JanusGraph, a new project under the sponsorship of the Linux Foundation with contributions by Google, Hortonworks, and IBM. ![]() GE, by contrast, is clustered in its default open source incarnation, although clustering on both Neo4j and GE requires manual setup. That said, only the commercial, enterprise-oriented edition of Neo4j supports sharding and replication. It's also available in both an open source community edition and a commercial product, whereas GE is only an open source project right now. How does all this shape up against the leading open source graph database, Neo4j? For one, Neo4j has been in the market longer and has an existing user base. "Instead of trying to provide an exhaustive set of built-in computation modules," states Microsoft's documentation, "GE tries to provide generic building blocks to allow us to easily build such modules." Those blocks include a system for synchronous and asynchronous message passing, as well as the LIKQ graph query language that's already used by the Academic Graph Search API in Microsoft Cognitive Services. It's not optimized out of the box for a specific kind of graph algorithm, so it'll likely appeal to those who want to write their own graph-exploration algorithms from the ground up - or simply write their own distributed algorithms. The "computation engine" part of the equation means GE implements distributed algorithms across nodes, written in C#. It can work as a simple key-value store like Memcached, but Redis may be the better comparison, since GE stores data in strongly typed schemas (string, integer, and so on). Microsoft calls Graph Engine (GE) as "both a RAM store and a computation engine." Data can be inserted into GE and retrieved at high speed since it's kept in-memory and only written back to disk as needed. The fruits of the effort, known as the Microsoft Graph Engine, are now available as an MIT-licensed open source project as an alternative to the likes of Neo4j or the Linux Foundation's recently announced JanusGraph. Microsoft's been exploring this area since at least 2013, when it published a paper describing the Trinity project, a cloud-based, in-memory graph engine. It's an important but often poorly understood method for exploring how items in a data set are interrelated. (See: Facebook monetizing your list of friends.) That's when a graph processing system comes in handy. Sometimes the relationships between the data you've gathered are more important than the data itself.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |