CI6226 Assignment 2-Search Engine Solved

35.00 $

Category:

Description

5/5 - (1 vote)

You are provided a dataset for this assignment, which you are free to use.  You can use your own dataset as well.

Assignment

In this assignment we continue building an information retrieval system based on the results of Assignment 1, which was a system that can output a sorted list of term-document pairs.

The task in this assignment is:

  • Build an inverted index;
  • Enable simple Boolean search; 3) Implement compression techniques.

1. Inverted Index

Input: a file with sorted term-doc pairs Output: inverted index

In this part you would need to take the file containing the sorted list of term-doc pairs and transform it into a simple inverted index.  In this assignment you don’t have to worry that the list or the inverted index can be too big for main memory; however, you can take that into account.

Bonus Points: persist the inverted index as a file so you don’t need to rebuild it every time you launch the program.

2. Boolean Search

Input: a search query, an inverted index

Output: a list of documents satisfying the query

Implement a simple AND-based Boolean search, i.e., a query “horse car phone” should be treated as “horse AND car AND phone” and return only documents that contain all three words.

Bonus Points: Implement OR and NOT in addition to AND.

3. Index Compression/Optimization

Implement compression and optimization techniques that were discussed in the lectures.  In particular, implement at least the Dictionary-as-a-String approach.  Implementing other techniques (blocking, front-coding, skip pointers, variable-length gap encoding) is encouraged.

Compare your search engine performance and memory requirements before and after implementing compression and optimizations.  Reflect the comparison in your report.

Bonus Points: If you implement many techniques and manage to achieve impressive savings in speed and/or memory, that may earn you bonus points.

  • 2-gdxkuo.zip