Volume : 2, Issue : 5, MAY 2016

FREQUENCY BASED SEMANTIC ANNOTATION SCHEME WITH HIGH PERFORMANCE

Amit kumar Yadav

Abstract

Annotation of web pages is an area of research which is getting lot of attention as the count of websites of specific topics and as a whole is increasing very fast. Since all the databases are accessible over web through HTML representations and data extraction over web is becoming more and more dynamic. Such data is huge and for applications such as on line shopping comparison, article collection etc. Annotation of such collected information leads to several advantages including fast decision making, relevant information visiting, to reduce the time of futile searches, historical data management and elimination of older searches. For annotation website pages shall be looked for content type, presentation style, data type, tag path and adjacencies of the contents.

This paper is intended to provide an insight of the annotation techniques and application of one of the techniques to provide the required results with the above stated advantages.  Works of various researchers in the field of annotating data has been more on limited tokens and focus is on creating dynamic annotations only. This work proposes to apply dynamic annotations on web sites data with tokenization done using all sort of tokens including long text having no specific tokens.

Keywords

Data Annotation, Web Databases, data alignment, data filtering, frequency annotation, multi mode text.

Article : Download PDF

Cite This Article

Article No : 15

Number of Downloads : 654

References

[1]   Yiyao Lu, Hai He, Hongkun Zhao, Weiyi Meng, Clement Yu, “Annotating Search Results from Web Databases”, Ieee Transactions On Knowledge And Data Engineering, Vol. 25, No. 3, March 2013

[2]  Yiyao Lu, Hai He, Hongkun Zhao, Weiyi Meng, Clement Yu, “Annotating Structured Data of the Deep Web”, This work is supported in part by the following NSF grants: IIS-0414981, IIS-0414939 and CNS-0454298.

[3] Sebastien Destercke, Patrice Buche, and Brigitte Charnomordic, “Evaluating Data Reliability: An Evidential Answer with Application to a Web-Enabled Data Warehouse”, Ieee Transactions On Knowledge And Data Engineering, Vol. 25, No. 1, January 2013

[4]   Marouane Hachicha and Je´ roˆme Darmont, “A Survey of XML Tree Patterns”, IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 25, NO. 1, JANUARY 2013

[5]   A. Arasu and H. Garcia-Molina, “Extracting Structured Data from Web Pages,” Proc. SIGMOD Int’l Conf. Management of Data, 2003.

[6]   L. Arlotta, V. Crescenzi, G. Mecca, and P. Merialdo, “Automatic Annotation of Data Extracted from Large Web Sites,” Proc. Sixth Int’l Workshop the Web and Databases (WebDB), 2003.

[7]   P. Chan and S. Stolfo, “Experiments on Multistrategy Learning by Meta-Learning,” Proc. Second Int’l Conf. Information and Knowledge Management (CIKM), 1993.

[8]   W. Bruce Croft, “Combining Approaches for Information Retrieval,” Advances in Information Retrieval: Recent Research from the Center for Intelligent Information Retrieval, Kluwer Academic, 2000.

[9]   V. Crescenzi, G. Mecca, and P. Merialdo, “RoadRUNNER: Towards Automatic Data Extraction from Large Web Sites,” Proc. Very Large Data Bases (VLDB) Conf., 2001.

[10]         S. Dill et al., “SemTag and Seeker: Bootstrapping the Semantic Web via Automated Semantic Annotation,” Proc. 12th Int’l Conf. World Wide Web (WWW) Conf., 2003.

[11]         H. Elmeleegy, J. Madhavan, and A. Halevy, “Harvesting Relational Tables from Lists on the Web,” Proc. Very Large Databases (VLDB) Conf., 2009.

[12]         D. Embley, D. Campbell, Y. Jiang, S. Liddle, D. Lonsdale, Y. Ng, and R. Smith, “Conceptual-Model-Based Data Extraction from Multiple-Record Web Pages,” Data and Knowledge Eng., vol. 31, no. 3, pp. 227-251, 1999.

[13]         D. Freitag, “Multistrategy Learning for Information Extraction,” Proc. 15th Int’l Conf. Machine Learning (ICML), 1998.

[14]         D. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning. Addison Wesley, 1989.

[15] A. Arasu and H. Garcia-Molina. Extracting Structured Data from Web pages. SIGMOD Conference, 2003.

[16] L. Arlotta, V. Crescenzi, G. Mecca, and P. Merialdo. Automatic Annotation of Data Extracted from Large Web Sites. WebDB Workshop, 2003.

[17] P. Chan and S. Stolfo. Experiments on Multistrategy Learning by Meta-Learning. CIKM Conference, 1993.

[18] W. Bruce Croft. Combining approaches for information retrieval. In Advances in Inf. Retr.: Recent Research from  the Center for Intel. Inf. Retr., Kluwer Academic, 2000.

[19]         V. Crescenzi, G. Mecca, and P. Merialdo. RoadRUNNER: Towards Automatic Data Extraction from Large Web Sites. VLDB Conference, 2001.