The document describes importing data from a CSV file into a Neo4j graph database using a test-driven approach. The tests validate that the import process creates nodes for countries, cities, addresses and their relationships correctly based on the data in the CSV. The tests are executed against an in-memory Neo4j instance created for each test. Cypher queries are used to verify the graph structure and content matches the expected output.
7. Scoping - MVP EMERGENCE
As a journalist, I need to quickly find people to interview, related to a particular
health product
For example :
Who are the managers of pharmaceutical labs producing a faulty drug?
Who are the health professionals the most influenced by these labs?
Who are the patient’s relatives, friends, colleagues... ?
8. Backlog
● Find the address of a lab
● Find labs that own a specific drug
● Find health professionals related to/influenced by labs
● Find health professionals the most influenced by labs within a year
● Find patients related to health professionals
● Find patients’ relatives, friends, colleagues
● ...
9. Backlog
● Find the address of a lab
● Find labs that own a specific drug
● Find health professionals related to/influenced by labs
● Find health professionals the most influenced by labs within a year
● Find patients related to health professionals
● Find patients’ relatives, friends, colleagues
● ...
18. “
NEO4J INC. IS LIKE NOSQL, IT HAS NO FUTURE, RIGHT?
Technical Stakeholder interview
19. Performance issues with
document management
systems
First graph library
prototypes
2000
2002
2007
2010
2013
Neo4j 2.0
Label addition to
the graph model
Neo4j browser
reworked
2016
Neo4j 3.0
Bolt protocol
Cypher extensions
2017
Neo4j 3.3
Neo Technology -> Neo4j Inc.
Neo4j Desktop with
Enterprise Edition
Development of the
first version of
Neo4j
Neo4j 1.0 is out
Headquarters moved
to the Silicon Valley
Neo4j : Leading graph database for more than 10 years !
Neo Technology is
created
20. “
But then, why Neo4j and
NOT another graph database?
Technical Stakeholder interview
21. DETOUR : NATIVE GRAPH DATABASE
:Person:Speaker
first_name Marouane
age 30
shoe_size 42
:Conference
name Devoxx Morocco
ATTENDS
first_name Hanae
ATTENDS
since 2015
:Person:Org
EMAILED
22. name Devoxx Morocco
ATTENDS
first_name Hanae
ATTENDS
first_name Marouane
age 30
:Person
:Org
:Conference
:Speaker
EMAILED
shoe_size 42
since 2015
DETOUR : NATIVE GRAPH DATABASE
23. name Devoxx Morocco
ATTENDS
first_name Hanae
ATTENDS
first_name Marouane
age 30
:Person
:Org
:Conference
:Speaker
since 2015
EMAILED
shoe_size 42
DETOUR : NATIVE GRAPH DATABASE
25. START
NODE
(SN)
name Devoxx Morocco
ATTENDS
first_name Hanae
ATTENDS
first_name Marouane
age 30
:Person
:Org
:Speaker
since 2015
EMAILED
shoe_size 42
SN
PrevRel
∅
SN
NextRel
:Conference
END
NODE
(EN)
EN
PrevRel
EN
NextRel
DETOUR : NATIVE GRAPH DATABASE
26. START
NODE
(SN)
name Devoxx Morocco
ATTENDS
first_name Hanae
ATTENDS
first_name Marouane
age 30
:Person
:Org
:Speaker
since 2015
EMAILED
shoe_size 42
SN
PrevRel
∅
SN
NextRel
:Conference
END
NODE
(EN)
EN
PrevRel
EN
NextRel
Index-free adjacency
Every co-located piece of data in the
graph is co-located on the disk
DETOUR : NATIVE GRAPH DATABASE
49. class MyClassTest {
@get:Rule val graphDb = Neo4jRule()
@Test
fun `some interesting test`() {
val subject = MyClass(graphDb.boltURI().toString())
subject.importDataset("/dataset.csv")
graphDb.graphDatabaseService.execute("MATCH (s:Something) RETURN s").use {
assertThat(it) // ...
}
}
}
LAB IMPORT - TDD style - Test skeleton
50. identifiant,pays_code,pays,secteur_activite_code,secteur,denomination_sociale,adresse_1,adresse_2,adresse_3,adresse_4,code_postal,ville
QBSTAWWV,[FR],FRANCE,[PA],Prestataires associés,IP Santé domicile,16 Rue de Montbrillant,Buroparc Rive Gauche,"","",69003,LYON
MQKQLNIC,[FR],FRANCE,[DM],Dispositifs médicaux,SIGVARIS,ZI SUD D'ANDREZIEUX,RUE B. THIMONNIER,"","",42173,SAINT-JUST SAINT-RAMBERT CEDEX
OETEUQSP,[FR],FRANCE,[AUT],Autres,HEALTHCARE COMPLIANCE CONSULTING FRANCE SAS,47 BOULEVARD CHARLES V,"","","",14600,HONFLEUR
FRQXZIGY,[FR],FRANCE,[MED],Médicaments,SANOFI PASTEUR MSD SNC,162 avenue Jean Jaurès,"","","",69007,Lyon
GXIVOHBB,[FR],FRANCE,[PA],Prestataires associés,ISIS DIABETE,10-16 RUE DU COLONEL ROL TANGUY,ZAC DU BOIS MOUSSAY,"","",93240,STAINS
ZQKPAZKB,[FR],FRANCE,[PA],Prestataires associés,CREAFIRST,8 Rue de l'Est,"","","",92100,BOULOGNE BILLANCOURT
GEJLGPVD,[US],ÉTATS-UNIS,[DM],Dispositifs médicaux,Nobel Biocare USA LLC,800 Corporate Drive,"","","",07430,MAHWAH
XSQKIAGK,[FR],FRANCE,[DM],Dispositifs médicaux,Cook France SARL,2 Rue due Nouveau Bercy,"","","",94227,Charenton Le Pont Cedex
ARHHJTWT,[FR],FRANCE,[DM],Dispositifs médicaux,EYETECHCARE,2871 Avenue de l'Europe,"","","",69140,RILLIEUX-LA-PAPE
LAB IMPORT - TDD style - companies.csv
51. @Test
fun `imports countries of companies`() {
newReader("/companies.csv").use {
subject.import(it)
}
graphDb.graphDatabaseService.execute(
"MATCH (country:Country) " +
"RETURN country {.code, .name} " +
"ORDER BY country.code ASC").use {
assertThat(it).containsExactly(
row("country", mapOf(Pair("code", "[FR]"), Pair("name", "FRANCE"))),
row("country", mapOf(Pair("code", "[US]"), Pair("name", "ÉTATS-UNIS")))
)
}
assertThat(commitCounter.getCount()).isEqualTo(1)
}
LAB IMPORT - TDD style - COUNTRIES
52. session.run("""
UNWIND {rows} AS row
MERGE (country:Country {code: row.country_code})
ON CREATE SET country.name = row.country_name
""".trimMargin(), mapOf(Pair("rows", rows)))
LAB IMPORT - TDD style - COUNTRIES
57. @Test
fun `imports addresses`() {
newReader("/companies.csv").use {
subject.import(it)
}
graphDb.graphDatabaseService.execute(
"MATCH (address:Address) " +
"RETURN address {.address} ").use {
assertThat(it).containsOnlyOnce(
row("address", mapOf(Pair("address", "16 RUE DE MONTBRILLANTnBUROPARC RIVE GAUCHE"))),
row("address", mapOf(Pair("address", "ZI SUD D'ANDREZIEUXnRUE B. THIMONNIER"))),
row("address", mapOf(Pair("address", "47 BOULEVARD CHARLES V"))),
row("address", mapOf(Pair("address", "162 AVENUE JEAN JAURÈS"))),
row("address", mapOf(Pair("address", "10-16 RUE DU COLONEL ROL TANGUYnZAC DU BOIS MOUSSAY"))),
row("address", mapOf(Pair("address", "8 RUE DE L'EST"))),
row("address", mapOf(Pair("address", "800 CORPORATE DRIVE"))),
row("address", mapOf(Pair("address", "2 RUE DUE NOUVEAU BERCY"))),
row("address", mapOf(Pair("address", "2871 AVENUE DE L'EUROPE")))
)
}
assertThat(commitCounter.getCount()).isEqualTo(1)
}
LAB IMPORT - TDD style - ADDRESSES
68. class CommitCounter : TransactionEventHandler<Any?> {
private val count = AtomicInteger(0)
override fun afterRollback(p0: TransactionData?, p1: Any?) {}
override fun beforeCommit(p0: TransactionData?): Any? = return null
override fun afterCommit(p0: TransactionData?, p1: Any?) = count.incrementAndGet()
fun getCount(): Int = return count.get()
fun reset() = count.set(0)
}
LAB IMPORT - TDD style - batch import
69. Backlog
● Find the address of a lab
● Find labs that own a specific drug
● Find health professionals related to/influenced by labs
● Find health professionals the most influenced by labs within a year
● Find patients related to health professionals
● Find patients’ relatives, friends, colleagues
● ...
76. CYPHER EXTENSION - User-Defined FUNCTION (neo4j 3.1+)
Publishing an extension 101
● Write the extension in any JVM language (Java, Scala, Kotlin…)
● Package a JAR
● Deploy the JAR to your Neo4j server: $NEO4J_HOME/plugins
77. CYPHER EXTENSION - User-Defined FUNCTION (neo4j 3.1+)
Publishing an extension 101
● Write the extension in any JVM language (Java, Scala, Kotlin…)
● Package a JAR
● Deploy the JAR to your Neo4j server: $NEO4J_HOME/plugins
78. CYPHER EXTENSION - User-Defined FUNCTION (neo4j 3.1+)
class MyFunction {
@UserFunction(name = "my.function")
fun doSomethingAwesome(@Name("input1") input1: String, @Name("input2") input2: String): Double {
// do something awesome...
}
}
80. CYPHER EXTENSION - User-Defined FUNCTION (neo4j 3.1+)
@UserFunction(name = "strings.similarity")
fun computeSimilarity(@Name("input1") input1: String, @Name("input2") input2: String): Double {
if (input1 == input2) return totalMatch
val whitespace = Regex("s+")
val words1 = normalizedWords(input1, whitespace)
val words2 = normalizedWords(input2, whitespace)
if (words1 == words2) return totalMatch
val matchCount = AtomicInteger(0)
val initialPairs1 = allPairs(words1)
val initialPairs2 = allPairs(words2)
val pairs2 = initialPairs2.toMutableList()
initialPairs1.forEach {
val pair1 = it
val matchIndex = pairs2.indexOfFirst { it == pair1 }
if (matchIndex > -1) {
matchCount.incrementAndGet()
pairs2.removeAt(matchIndex)
return@forEach
}
}
return 2.0 * matchCount.get() / (initialPairs1.size + initialPairs2.size)
}
81. CYPHER EXTENSION - User-Defined FUNCTION (neo4j 3.1+)
@UserFunction(name = "strings.similarity")
fun computeSimilarity(@Name("input1") input1: String, @Name("input2") input2: String): Double {
if (input1 == input2) return totalMatch
val whitespace = Regex("s+")
val words1 = normalizedWords(input1, whitespace)
val words2 = normalizedWords(input2, whitespace)
if (words1 == words2) return totalMatch
val matchCount = AtomicInteger(0)
val initialPairs1 = allPairs(words1)
val initialPairs2 = allPairs(words2)
val pairs2 = initialPairs2.toMutableList()
initialPairs1.forEach {
val pair1 = it
val matchIndex = pairs2.indexOfFirst { it == pair1 }
if (matchIndex > -1) {
matchCount.incrementAndGet()
pairs2.removeAt(matchIndex)
return@forEach
}
}
return 2.0 * matchCount.get() / (initialPairs1.size + initialPairs2.size)
}
83% of matches!
82. detour - neo4j Rule and user-defined functions
@get:Rule
val graphDb = Neo4jRule()
.withFunction(
StringSimilarityFunction::class.java
)
83. Drug import
session.run("""
UNWIND {rows} as row
MERGE (drug:Drug {cisCode: row.cisCode})
ON CREATE SET drug.name = row.drugName
WITH drug, row
UNWIND row.labNames AS labName
""".trimIndent(), mapOf(Pair("rows", rows), Pair("threshold", labNameSimilarity)))
84. session.run("""
UNWIND {rows} as row
MERGE (drug:Drug {cisCode: row.cisCode})
ON CREATE SET drug.name = row.drugName
WITH drug, row
UNWIND row.labNames AS labName
MATCH (lab:Company)
WITH drug, lab, labName, strings.similarity(labName, lab.name) AS similarity
""".trimIndent(), mapOf(Pair("rows", rows), Pair("threshold", labNameSimilarity)))
Drug import
85. session.run("""
UNWIND {rows} as row
MERGE (drug:Drug {cisCode: row.cisCode})
ON CREATE SET drug.name = row.drugName
WITH drug, row
UNWIND row.labNames AS labName
MATCH (lab:Company)
WITH drug, lab, labName, strings.similarity(labName, lab.name) AS similarity
WITH drug, CASE WHEN similarity > {threshold} THEN lab ELSE NULL END AS lab,
labName
ORDER BY similarity DESC
WITH drug, labName, HEAD(COLLECT(lab)) AS lab
""".trimIndent(), mapOf(Pair("rows", rows), Pair("threshold", labNameSimilarity)))
Drug import
86. session.run("""
UNWIND {rows} as row
MERGE (drug:Drug {cisCode: row.cisCode})
ON CREATE SET drug.name = row.drugName
WITH drug, row
UNWIND row.labNames AS labName
MATCH (lab:Company)
WITH drug, lab, labName, strings.similarity(labName, lab.name) AS similarity
WITH drug, CASE WHEN similarity > {threshold} THEN lab ELSE NULL END AS lab, labName
ORDER BY similarity DESC
WITH drug, labName, HEAD(COLLECT(lab)) AS lab
FOREACH (ignored IN CASE WHEN lab IS NOT NULL THEN [1] ELSE [] END |
MERGE (lab)<-[:DRUG_HELD_BY]-(drug))
FOREACH (ignored IN CASE WHEN lab IS NULL THEN [1] ELSE [] END |
MERGE (fallback:Company:Ansm {name: labName})
MERGE (fallback)<-[:DRUG_HELD_BY]-(drug)
)""".trimIndent(), mapOf(Pair("rows", rows), Pair("threshold", labNameSimilarity)))
Drug import
87. CYPHER TRICKS - FOREACH as poor man’s IF
FOREACH (ignored IN CASE WHEN lab IS NOT NULL THEN [1] ELSE [] END |
MERGE (lab)<-[:DRUG_HELD_BY]-(drug))
FOREACH (ignored IN CASE WHEN lab IS NULL THEN [1] ELSE [] END |
MERGE (fallback:Company:Ansm {name: labName})
MERGE (fallback)<-[:DRUG_HELD_BY]-(drug)
)
FOREACH (item in collection | ...do something...)
88. @RestController
class LabsApi(private val repository: LabsRepository) {
@GetMapping("/packages/{package}/labs")
fun findLabsByMarketedDrug(@PathVariable("package") drugPackage: String): List<Lab> {
return repository.findAllByMarketedDrugPackage(drugPackage)
}
}
Drug import - API
89. @Repository
class LabsRepository(private val driver: Driver) {
fun findAllByMarketedDrugPackage(drugPackage: String): List<Lab> {
driver.session(AccessMode.READ).use {
val result = it.run("""
MATCH (lab:Company)<-[:DRUG_HELD_BY]-(:Drug)-[:DRUG_PACKAGED_AS]->(:Package {name: {name}})
OPTIONAL MATCH (lab)-[:IN_BUSINESS_SEGMENT]->(segment:BusinessSegment),
(lab)-[:LOCATED_AT_ADDRESS]->(address:Address),
(address)-[cityLoc:LOCATED_IN_CITY]->(city:City),
(city)-[:LOCATED_IN_COUNTRY]->(country:Country)
RETURN lab {.identifier, .name},
segment {.code, .label},
address {.toAddress},
cityLoc {.zipcode},
city {.name},
country {.code, .name}
ORDER BY lab.identifier ASC""".trimIndent(), mapOf(Pair("name", drugPackage)))
return result.list().map(this::toLab)
}
}
Drug import - REPOSITORY
90. Backlog
● Find the address of a lab
● Find labs that own a specific drug
● Find health professionals related to/influenced by labs
● Find health professionals the most influenced by labs within a year
● Find patients related to health professionals
● Find patients’ relatives, friends, colleagues
● ...
92. TOP 3 Health Professionals - API
@RestController
class HealthProfessionalApi(private val repository: HealthProfessionalsRepository) {
@GetMapping("/benefits/{year}/health-professionals")
fun findTop3ProfessionalsWithBenefits(@PathVariable("year") year: String)
: List<Pair<HealthProfessional, AggregatedBenefits>> {
return repository.findTop3ByMostBenefitsWithinYear(year)
}
}
93. TOP 3 Health Professionals - API
@Repository
class HealthProfessionalsRepository(private val driver: Driver) {
fun findTop3ByMostBenefitsWithinYear(year: String): List<Pair<HealthProfessional, AggregatedBenefits>> {
val result = driver.session(AccessMode.READ).use {
val parameters = mapOf(Pair("year", year))
it.run("""
MATCH (:Year {year: {year}})<-[:MONTH_IN_YEAR]-(:Month)<-[:DAY_IN_MONTH]-(d:Day),
(bt:BenefitType)<-[:HAS_BENEFIT_TYPE]-(b:Benefit)-[:GIVEN_AT_DATE]->(d),
(lab:Company)-[:HAS_GIVEN_BENEFIT]->(b)-[:HAS_RECEIVED_BENEFIT]->(hp:HealthProfessional),
(hp)-[:SPECIALIZES_IN]->(ms:MedicalSpecialty)
WITH ms, hp, SUM(toFloat(b.amount)) AS total_amount, COLLECT(DISTINCT lab.name) AS labs,
COLLECT(bt.type) AS benefit_types
ORDER BY total_amount DESC
RETURN ms {.code, .name}, hp {.first_name, .last_name}, total_amount, labs, benefit_types
LIMIT 3
""".trimIndent(), parameters)
}
return result.list().map(this::toAggregatedHealthProfessionalBenefits)
}
}
97. “Nothing is ever finished” - TODO list
Optimize the import
Use Spring Data Neo4j
Use “graphier” algorithms (shortest paths, page rank…)
Expose GraphQL API - http://grandstack.io/