When knowledge and enterprise are invested how will you call it ?
Answers
Business challenges
There are two challenges on the business side.
- Data Privacy: how to provide deep and useful analysis services without violating the privacies of a company and its employees.
- Killer Services on the Graph: EKG is complex and huge, how to make the graph easy to use is a challenge.
The solution for two challenges.
Data Privacy:
- Transform the original data into the rank form or the ratio form instead of using real accurate values (rank form or the ratio form?)
- Obscure critical nodes (e.g., person-related information) which should not be shown when visualizing the EKG as a graph
Killer Services on the Graph:
- Deliver services that directly meet the business requirements of users. For example, the service finding an enterprise’s real controllers tells the investors from investment banks who are the real owner of a company, and the service enterprise path discovery provides hints on how the investors could reach the enterprises they want to invest in.
Technology challenges
Technology challenges arise from the diversity and the scale of the data sources.
- Constructing problems such as transforming the databases to RDF (D2R), representing and querying difficulties when meta properties and n-ary relations are involved
- Performance issues since the KG contain more than one billion triples
Before introducing challenges in detail, let’s see the whole workflow to build the EKG first.
At the first stage of our project, we mainly utilize relational databases (RDBs) from CSAIC.
Secondly, we supplement the EKG with bidding information from the Chinese Government Procurement Network (CGPN) and stock information from Eastern Wealth Network (EWN).
Then the EKG is fused with the patent information extracted from the Patent Search and Analysis Network of State Intellectual Property Office (PASN-SIPO) in another project.
At last, the competitor relations and acquisition events are added to the EKG. This information is extracted from encyclopedia sites, namely Wikipedia, Baidu Baike and Hudong Baike.
The following challenges are encountered during the above process:
Data Model (Complex data types): meta property (property of relations, or property graph) and event (n-ary relation). But no existing mature solutions on representing and querying meta properties and events in an efficient way.
D2R Mapping: using D2R tools (e.g., D2RQ9) to map RDBs from CSAIC into RDF has the following challenges: a) Mapping of meta property. b) Data in the same column of RDBs map to different classes in RDF. c) Data in the same RDB tables may map to different classes having subClass relations.
Information Extraction: Extract useful relation from various types, like “competitive”, “acquisition” and so on. Entity extraction becomes difficult when there are abbreviations of company names in encyclopedic sites.