"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
Personal social network search and mining
1. Personal Social Network — A New Approach to Personal Network Search based on Information Extraction Jie Tang, Mingcai Hong, Jing Zhang, Bangyong Liang, and Juanzi Li Knowledge Engineering Group, Department of Computer Science and Technology, Tsinghua University Sep. 5 th , 2006
2.
3.
4. Processing Flow Submitted to Returned pages Fed to Extracting and saving to Ontology base Query Classification Model
6. Annotation using SVMs Personal profile: e.g. image, affiliation, etc. Contact information: fax, email, phone, etc. Start position model End position model Identified info. Features sets
7. Person Search Search for a person using the name or other information, e.g. affiliation
10. Association Search Finding associations between persons - high efficiency - Top-K associations Usage: - to find a partner - to find a person with same interests
Welcome Professor Dieter to Tsinghua. Wish to get your advice and instruction. I am Li Juanzi from Knowledge Engineering Group in the department of computer science and technology. Thank professor Yang to give me this opportunity to introduce our work about semantic web And web services.
The is the processing flow of the contact search. After the user inputted the person name, the system first query the database. If the database has the contact information of that person, the system will return the contact information directly. If not, the system submits the person name to Google. For the returned documents by Google, we take into consideration the top ranked 50 documents and fed them to a classifier. Our statistic shows that more that 90% of the personal information is located in the top ranked 20 documents and more that 95% of the personal information is located in the top ranked 50 documents. The classifier identifies whether a document contains the personal information or not. Finally, we make use a SVM based method for the extraction and save the extracted data into the database.
In non-text filtering, we use the similar methods for header, signature, and program code detection. In the methods, we view a text line in an email as an instance in SVM. For each instance, we define a set of features. The method consists of two stages: training and detection. We use header as example to explain how we conduct the non-text block detection. In training, we use the training data as input and define two sets of features respectively for header start line and header end line detection. We then use the two feature sets to construct two SVM models. In detection, we identify whether or not a line is the start line of a header, and whether or not a line is the end line of a header using the two SVM models. We then view the lines between the identified start line and the end line as a header. So, to define effective features is one of our focuses.
That is all for my introduction to our lab. Thank all