Network analysis: Visualization of connections & relations between named entities (Solr graph view)

The graph/network analysis view shows you the direct and indirect relations, connections and networks between named entities like persons, organizations or main concepts which occur together (co-occurences) in your contents, datasources and documents.

Therefore enable the Open Source ETL plugin for integration with the Neo4j graph database in the config file .

Extracted named entities like persons, organizations or locations (Named entity extraction) are used for structured navigation, aggregated overviews and interactive filters (faceted search) and to be able to get leads for connections and networks because you can analyze which persons, organizations or places occor together in how many documents.

Additionally to known named entities in a thesaurus or imported ontologies other data analysis plugins integrate Named Entity Recognition (NER) by spaCy and/or Stanford Named Entities Recognizer (Stanford NER).

So by integration of machine learning for analysing the structure of the text and classifying parts/words of the sentences to categories like person, location or organization, many yet unknown named entities can be extracted, which aren't configured or listed yet in the thesaurus or a list of names or ontology.

Therefore it uses models trained with existing annotations of a large text corpus, so after that they can "predict" or better: guess by probability if a part of a sentence is a name of a person, a name of an organization, a verb or a place.

Since no machine learning algorithm and machine learning model is perfect, the search engine combines the analysis with other methods and data which is curated by human editors.

Therefore you can add important names to the thesaurus, so the search engine will extract them even if the named entities recognition fails.

You don't have to add each name yourself:

By the ontologies manager you can import thousands of names from Open Data like Wikidata which offers an universal structured database with names of people like for example lists of names of politicians and members of parliament(s).

Additional entities in the thesaurus are added to the OCR dictionary and so they are found better in scanned documents by the automatic OCR integration for example for images in PDF files.

Since no automatic analysis and tagging or annotation is perfect you can tag manually document s by the semantic tagger or annotate parts within documents by Hypothesis annotator.

Images Powered by Shutterstock

The Data Daily

Network analysis: Visualization of connections & relations between named entities (Solr graph view)