UE803- Data Science Project 1 Solved

30.00 $

Category: Tags: , , , ,

Description

Rate this product

Insolvency is the state of being unable to pay the money owed, by a person or company, on time; those in a state of insolvency are said to be insolvent

❑ Balance-sheet insolvency is when a person or company does not have enough assets to pay all of their debts. The person or company might enter bankruptcy, but not necessarily.

▪ If a loss is accepted by all parties, negotiation is often able to resolve the situation without bankruptcy.

❑ Liquidation is the process in accounting by which a company is brought to an end.

❑ An ended company is said dissolved.

SDS Project 2

Business Status

❑ Active – live and doing business
❑ Active (default of payments) – balance-

sheet insolvency

❑ Active (receivership) – a trustee is legally appointed to act as the custodian of a company

❑ Bankruptcy – in the process of bankruptcy

❑ In liquidation – being closed (not for bankruptcy)

❑ Dissolved –closed

SDS Project 3

Bankruptcy/Failure prediction

Bankruptcy prediction consists of predicting bankruptcy and other financial distresses/losses.

❑Will a firm go into a bankruptcy/liquidation/dissolved state? ❑When will it happen?

Approaches:

❑Parametric methods: curve fitting, statistical tests, regression, survival analysis, …

❑Non-parametric (machine learning): decision trees, neural networks, ensembles, …

➢Preliminary setting: fix a status subset as the notion of “failure” ➢Eg., Failure is Status == ‘Bankruptcy’
➢E.g., Failure is Status != ‘Active’

SDS Project 4

Question(s) (A)

Compare the distributions of size/age/other between failed and active companies at a specific year?

❑do they change for a specific company form (SPA, SRL, etc.)
❑do they change for a specific industry sector? (see ATECO sectors)

i.e., there is a statistically significant difference

Failed

Active

Size

SDS Project 5

P(X=a)

Question(s) (B)

Compare the distributions of size/age/other of failed companies over different years?

❑are there any shift for a specific company form (SPA, SRL, etc.)

❑are there any shift for a specific location? (eg., Tuscany, Lombardy, etc.)

Year=2016

Year=2017

Size

SDS Project 6

P(X=a)

Questions (C)

What is the probability of failures conditional to size/age/other of firms at a specific year?

❑does it change for a specific company form (SPA, SRL, etc.)
❑does it change for a specific industry sector? (see ATECO sectors) ❑does it change for a specific location? (eg., Tuscany, Lombardy, etc.)

Probability of Default (PD) P(failure|X=a)

SDS Project 7

Failure prediction

In addition to age, size, industry sector, and location, financial indicators that may correlate to failures have been widely studied in the literature, which motivates (credit/failure) scoring methods.

SDS Project 8

Failure prediction

External data may also be part of the model (especially for multi-annual data):

▪Market indexes (GDP, etc.) ▪Financial indexes (ECB

interest rates, etc.) ▪Stock indexes (MIB, etc.)

SDS Project 9

Scoring methods

Parametric models (non-exclusive list):

▪Linear Discriminant Analysis ▪ Altman Z-score model

▪Logistic Regression
▪Penalized (or Elastic Net Regularization) Logistic Regression

SDS Project 10

Questions (D)

Fit a parametric model, and use it for failure scoring: ❑use one or more parametric models

▪ you can rely on literature: just cite the source of the adopted approach
▪ check whether the hypotheses of the model are satisfied (normality, multicollinearity,

etc.)!
❑split data temporally

▪ use AUC-ROC and calibration plot as quality measures ▪ possibly, use other quality measures of your choice

❑develop both a scoring model and a rating model
▪ scoring model: S(x) = probability of default
▪ rating model: R(x) = class of probability of default (see e.g., bond credit rating classes)

❑explain in deep the results of the approach
▪ particularly, what is the meaning of fitted parameters, confidence intervals of quality

measures, statistical tests of rating, statistical test of comparison among models.

Machine Learning Models (Random Forests, Gradient Boosted Trees, …) can also be fit/compared in addition to at least one parametric model.

SMD Project 11

Questions (E)

Extend/adapt/investigate your scoring model(s) with reference to one of these techniques:

❑uncertainty in score predictions, i.e., using confidence intervals for each prediction;

❑or, selective classification, where:
𝑆(𝑤) = ቊ 𝑠(𝑤) 𝑖𝑓 𝑔(𝑤)

𝑎𝑏𝑠𝑡𝑎𝑖𝑛 𝑜𝑡h𝑒𝑟𝑤𝑖𝑠𝑒

❑sensitivity of predictions w.r.t. noise
❑impact of missing values on prediction performances ❑… other advanced techniques studied in the course …

SDS Project 12

Resources

❑AIDA dataset of (many) Italian companies Teams/Files/Project ▪ with historical data (last 10 years from closing)

❑Ateco 2007 classification of industry sectors
▪ Italian version of the European NACE classification ▪ https://www.istat.it/it/archivio/17888
▪ Excel file + description notes

❑Reference literature on Teams/Files/Project ❑Scientific paper indexes: Scholar, DBLP, arX

  • Project-1-2rysaa.zip