CSE435_535 Project3-Evaluation of IR models Solved

30.00 $

Category:
Click Category Button to View Your Next Assignment | Homework

You'll get a download link with a: zip solution files instantly, after Payment

Securely Powered by: Secure Checkout

Description

Rate this product

The goal of this project is to implement various IR models, evaluate the IR system and improve the search result based on your understanding of the models, the implementation and the evaluation.

You are given twitter data in three languages – English, German and Russian, 15 sample queries and the corresponding relevance judgement. You will index the given twitter data by Solr, implement Vector Space Model and BM25 based on Solr, and evaluate the two sets of results using Trec_Eval program. Based on the evaluation result, you are asked to improve the performance in terms of the measure Mean Average Precision (MAP).

The following sections describe the tasks involved, evaluation criteria and submission guideline.

Section 1: Dataset

[provided file: train.json]

The data given is Twitter data saved in json format, train.json. Three languages are included – English (text_en), German (text_de) and Russian (text_ru).

train.json: This file contains the tweets with some fields extracted from raw data. Sample tweet format is as follows:

{

“lang”: ,

“id”: ,

“text_de”: ,

“text_en”: ,

“text_ru”: ,

“tweet_urls”: [ ],

“tweet_hashtags”: []

}

 

 

 

 

Section 2: Implementing IR models

[provided files: queries.txt, qrel.txt, sample_trec_input.txt, json_to_trec.py]

 

Index

In this step, you will need to index the data as you have done in project 1.

 

Various IR models

In this step, you will need to implement Vector Space Model (VSM) and BM25 (Note that Solr version 6.0 and above by default uses BM25 model). In Solr, these models are implemented through a predefined class called “Similarity”.

 

Here are some useful links for your reference:

 

  • All similarity classes that you can choose from Solr, which means that very likely you do NOT need to implement an IR model from scratch: http://lucene.apache.org/solr/7_0_0/solr-

core/org/apache/solr/search/similarities/package-summary.html

 

  • To specify and customize different similarity functions in Solr Schema:

http://lucene.apache.org/solr/guide/7_5/otherschemaelements.html#OtherSchemaElementsSimilarity

http://wiki.apache.org/solr/SchemaXml#Similarity

 

 

Input Queries

You are provided with 15 sample queries (queries.txt) and corresponding manually judged relevance score (qrel.txt).

queries.txt, includes 15 sample queries. One query per line. Each line has the following format:

query_number query_text

For example,

001 Russia’s intervention in Syria Your retrieval result is mainly based on the query_text.

 

qrel.txt, includes manually judged relevance score. Format is as shown below query_number 0 document_id relevance

For example,

001 0 653278482517110785 0

 

 

 

Query result of Solr

The query result of Solr can be specified into json format, which include at least tag: id and score.

 

For example, you can use a query link like this:

http://localhost:8983/solr/corename/select?q=*%3A*&fl=id%2Cscore&wt=json&indent=true&row s=20  to get the score and id (Note: change the “localhost” as your hostname and “corename” as the name of your Solr core). For more query parameters, please check https://cwiki.apache.org/confluence/display/solr/Common+Query+Parameters

 

The query result should be processed into below format to accommodate the input format of TREC evaluation program. A Python script (json_to_trec.py) is provided to help you accomplish this task.

 

The final result of the search system should be a ranked list of documents as returned by the retrieval system. It should have the following format,

query-number Q0 tweet_id rank similarity_score model_name

For example,

001 Q0 653278466788487168 0 0.22385858 default

where,

001 is the query number;

Q0 is a constant, ignored in TREC evaluation;

653278466788487168 is the document id. In this case, tweet_id;

  • is the rank of this document for query 001;

0.22385858 is the similarity score returned by IR model BM25, which is default in Lucene; default is the model name you used.

 

A sample file is provided in file sample_trec_input.txt.

 

NOTE: For final submission, we ask you to restrict the (maximum) number of returned documents as 20, i.e., in each query url, add “rows=20”.

 

Section 3: TREC Evaluation

[provided files: qrel.txt, sample_trec_output]

 

In this part, you will be using TREC_eval program. You can download the latest version from http://trec.nist.gov/trec_eval/. After downloading, read the README file carefully. One of the basic commands is

trec_eval -q -c -M1000 official_qrels submitted_results

 

For example, you can use following command to evaluate the sample query output file. trec_eval -q -c -M 1000 qrel.txt sample_trec_input.txt

 

This command will give you a number of common evaluation measure results.

 

For more information on how to use or interpret the result, go to

http://wwwnlpir.nist.gov/projects/t01v/trecvid.tools/trec_eval_video/A.README

 

A sample TREC_eval output file is provided in file sample_trec_output.txt.

 

Section 4: Improving IR system

 

Together with your training queries, query results, ground truth judgements and the TREC_eval result, by now you might gain an intuition on the performance of your IR system. We choose the measure MAP as main objective to improve. Here is a list of things you could try to improve your evaluation score.

 

  1. Understand the measure itself. How to improve MAP?
  2. Do you need to do advanced query processing to improve the result? For example, boosting the query based on different fields? Expand the query, say translate the query into other languages? Use different query parser? Use any filters for query processing? More details can be found in

https://cwiki.apache.org/confluence/display/solr/The+Standard+Query+Parser

  1. Do you need to have better index? For example, do you need to have additional fields to use additional analyzer and tokenizer to achieve better query result? For example, http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_make_exactcase_matches_score_higher
  2. Do you need to tweak the parameters of the IR model to make it more suitable to the query? For example, in BM25 model, there are two parameters you can set up. What is the meaning of these parameters and how to tweak it?

 

In your report, list all the methods you use to improve the model and your observations.

 

Section 5: Grading Criteria and Submission

 

The total points of this project are 15. We will evaluate your work within three aspects:

  1. If you have successfully submitted and implemented two models with default settings, you get 7 points (3+2+2). The default setting for each model can be found as http://lucene.apache.org/solr/7_0_0/solr-

core/org/apache/solr/search/similarities/package-summary.html

  1. There are 3 points given based on the work you have done to improve performance. We will read your report and judge its quality.
  2. The remaining 5 points are given based on the performance (mainly MAP) of your besteffort systems (among three models) on test queries. We will quantify the performance of the whole class and the top 30% will get full 5 points, 4 points for next quantile, etc.

 

About one week before the deadline, you will be given 5 test queries. You will be asked to provide the query results in the same format of sample_trec_input.txt for each query, each model.

 

You also need to submit a report, which include the following contents:

  1. Describe how to implement each model in Solr and provide screenshots on your key implementation and results to demonstrate that you have successfully implemented them.
  2. What have you done to improve the performance in terms of MAP? Please list what you have done one by one and create a table showing MAP scores before and after your implementation. You can also use plots to make the comparison.

 

Your report is suggested to be brief and technical-oriented, because we don’t judge it based on its length.  

  • Project-3-szakwp.zip