Introduction to Knowledge Graphs

By
Parixit Davé, Douglas Ward
April 05, 2019

Knowledge graphs have recently gained popularity as sources of factual knowledge, represented as graphs, that can be used for a variety of applications. Knowledge graphs are the ultimate data set, encoding facts with well-defined semantics modeled in an ontology. Ultimately, knowledge graphs are to AI what explicit factual memory is to human intelligence: a way to organize knowledge about the world. 

In this presentation, Dr. François Scharffe gave an overview of current applications of knowledge graphs. He explained the basics of representing, storing and querying data in a knowledge graph.

Knowledge Graphs 101

Poll everywhere icebreaker:

  1. How much do you know about knowledge graphs(KGs)?

  2. What words would you associate with KGs?

Francois shares results from PE with group….some familiarity with KGs in the room

AI/ML/data science process:

  1. Define the business problem

  2. Build the dataset

  3. Design the model

  4. Evaluate results

Building a dataset is time-consuming for data scientists:

  • Cleaning and organizing data - 60% of time

  • Collecting data sets - 19% of time

KGs: many data sources are integrated so that they can be reused across applications and models

Hard work/heavy lifting is performed upstream

  • Data quality

  • Entity matching

  • Schema mapping

  • ETL

  • Ontology definition

AI today:

Data sets - find, prepare, clean, integrate, qualify

Prepared data - train

Model - predict

Predictions

AI tomorrow - integrated clouds of data; ongoing prediction, refining, training loops

Intelligence is dependent on memory

Knowledge graphs allow data to be represented as facts

 

How to build a KG

Data source (has to be structured somehow) - ETL

2nd Data source - ETL

Repeat often, build models based on facts, correlations, etc

 

Most models are generated using the current AI linear approach

KG’s being used to generate models in an ongoing cyclical process is an active research field in ML and AI

Models can be used for predictive outcomes as defined in the task, or also to improve the KG itself in an ongoing growth cycle through ML

 

Graphs and schemas

RDFS standard (as defined by WWWC)

Ontology creates a schema (metadata) that dictates how Data will appear in a graph

 

Schemas can be built from scratch or existing “vocabularies” can be reused

One open standard from W3C

  • RDF graph modeling language

  • SPARQL graph query lang.

  • OWL ontology lang.

 

Mix of languages:

  • GSQL

  • GQL

  • Cypher

  • Gremlin

Issues

  • Fragmentation on one side

  • Complexity on the other side

 

An ongoing effort to harmonize all the languages and build standards for the standards

Graphing software/vendors

  • neo4J

  • graphDB

  • AWS - Neptune

  • allegroGraph

  • tigerGraph

  • Stardog

Each has its specializations, strengths, weaknesses, etc

 

What is an Ontology?

  • Formal Specification of a shared conceptualization of a domain

    • Conceptualization: description of how we think about a domain

    • Specification: a formal way of writing the concept. Down

    • Formal: defined by axioms in a language

    • Shared: represents a community and should be reusable

 

http://schema.org

Open and collaborative dev. Of an ontology for the annotation of web pages