Domain Global Graphs

Integrates PGX & Data Studio into solutions that investigate domain graphs, e.g., in the Financial Crime & Compliance Studio (FCC Studio), and researches improvements, e.g., by using Machine Learning.

Project Details

Domain Global Graphs

Domain Global Graphs

Integrates PGX & Data Studio into solutions that investigate domain graphs, e.g., in the Financial Crime & Compliance Studio (FCC Studio), and researches improvements, e.g., by using Machine Learning.

Project Overview

For enterprise use cases of Oracle Labs' PGX and Data Studio projects, organizations often generate one global graph from their enterprise domain data (e.g., financial, retail or healthcare data), that all users have access to, so that they can run graph analytics and conduct investigations on them. PGX helps users to reveal latent information in the graph representation of domain data through a toolkit for graph analysis that supports running custom graph algorithms such as PageRank, or performing SQL-like pattern-matching on graphs (PGQL – property graph query language), or getting graph features out of the graph to enrich Machine Learning models. Oracle Labs Data Studio is a web-based notebook platform that combines live code collaboration in multiple programming languages with graph analytics and rich, interactive visualizations that are also specialized for graphs by supporting filtering graphs, highlighting elements, visualizing geographical data, and expanding/contracting the graph view.

A prime example of domain global graphs is the Financial Crime and Compliance Studio in the financial domain. FCC Studio is an integrated workbench for financial crime data scientists, whose goal is to help catch predicate criminals (e.g., tax evasion, human trafficking, etc.). FCC Studio automatically links a bank's data, such as customers, transactions, and alerts to related data in external watchlists and data sources, into a global financial graph, using Entity Resolution techniques. It then provides a plethora of tools to help the investigation of potential financial crimes in the financial graph:

* Anti-Money Laundering scenario authoring
* Pre-defined out-of-the-box scenarios, e.g., to calculate risk factors and red flags, and explore investigation cases
* Machine Learning enhancements
* Open tools including Apache Spark, Apache Zeppelin, R and Python
* Frequent updates to the global graph

The Domain Global Graph team helps with the integration of PGX and Data Studio into such solutions that support the investigation of domain global graphs, and helps with further research on how to produce additional insight to facilitate investigation, e.g., by using Machine Learning techniques. We handle various challenging and research topics, including:

* Graph-based Machine Learning enhancements that employ algorithms that deal with vector representations of graph entities. Example use cases:
  - Improve global graph investigation, e.g., regarding financial patterns by detecting similar subgraphs
  - Improve Named Entity Recognition (NER) or Entity Linking techniques, e.g., by recognizing the graph entities and their relationships.
  - Improve Entity Resolution (ER) techniques, e.g., by considering the graph structure and properties of the resolved entities, with explainability as well.
* Offloading graph entities (e.g., vertices, edges, properties) from PGX to an external store
* Efficient regular updates to the global graph
* Extract-transform-load enhancements using big data technologies (e.g., Spark)
* Graph View Authorization, e.g., to allow users to access only parts of the graphs that they are authorized to
* Custom visualizations in Oracle Labs Data Studio
* Enhancements for deployment (e.g., using Kubernetes), testing, and performance

Principal Investigator

Iraklis Psaroudakis

Principal Member of Technical Staff

I am a researcher at Oracle Labs, Switzerland. My main research interests include improving the performance of analytical & graph workloads, parallel programming, and OS / runtime-system interaction. Prior to Oracle, I completed my Ph.D. at the Data-Intensive Application and Systems Laboratory of EPFL (Lausanne, Switzerland), focusing on scaling up highly concurrent analytical database workloads on multi-socket multi-core servers through (a) sharing data and work across concurrent queries, and (b) adaptive NUMA-aware data placement and task scheduling.

Publications