Parallel Graph AnalytiX (PGX)
Parallel, efficient, in-memory, single-machine and distributed graph processing
Parallel Graph AnalytiX (PGX)
Parallel Graph AnalytiX (PGX)
Parallel, efficient, in-memory, single-machine and distributed graph processing
Project Overview
Graphs are a powerful abstraction to enable knowledge discovery from relationships in large datasets, thanks to their explicit representation of relationships as edges. Graph analysis reveals latent information that is encoded, not as fields in the data, but as direct and indirect relationships among elements of the data – information that is not obvious to the naked eye, but can have tremendous value once uncovered.
PGX is a toolkit for graph analysis that supports graph algorithms (e.g., PageRank), SQL-like pattern-matching graph queries, and graph machine learning, where the latent information in graphs is extracted as ML features. Graphs can be loaded from a variety of sources including flat files, HDFS, SQL and NoSQL databases; incremental updates are also supported.
The PGX toolkit includes both a single-machine in-memory engine and a distributed engine for very large graphs. The single-machine engine is highly parallel and can run both in embedded and in client-server modes. The distributed engine offers great performance and scalability and includes a novel graph querying engine that can match patterns of any size while limiting memory usage.
PGX is already available both as a feature in Oracle products and as an active research project at Oracle Labs, with a world-class team of researchers further advancing the capabilities of the toolkit.