Unveiling Regional Disparities

Integrating Multiple Data Sources for Small-Area Income Inequality Estimates

Jinjing Li, University of Canberra
ANZRSAI Conference 2024

[email protected]

Unveiling Regional Disparities: Integrating Multiple Data Sources for Small-Area Income Inequality Estimates

Introduction

  • Income level and its distribution at small area level can provide important insights for evidence-based policymaking
  • Multiple data sources are available but not trivial to combine; longitudinal data is rare
  • This paper: integrating multiple data sources for small-area income inequality estimates in a new framework
Unveiling Regional Disparities: Integrating Multiple Data Sources for Small-Area Income Inequality Estimates

Why Inequality at Small Area Level?

  • Policy Implications: Small-area estimates are crucial for targeted policy interventions
  • Peer Effects: Local conditions tend to have a strong effect on wellbeing
    • Health outcomes (Pickett & Wilkinson, 2015)
    • Economic mobility (Chetty et al., 2014)
    • Social cohesion, education outcomes etc.
Unveiling Regional Disparities: Integrating Multiple Data Sources for Small-Area Income Inequality Estimates

Common Approaches

  • Estimate inequality at a large-enough area with sufficient data
  • Reweighting survey data to match selected census values
  • Use a hierarchical model with a random effect (often normally distributed) to provide small area estimates (Rao & Molina, 2015)
  • Impute values onto (unit-record) Census data, e.g. ELL (Elbers et al., 2003)
Unveiling Regional Disparities: Integrating Multiple Data Sources for Small-Area Income Inequality Estimates

Key Challenges in Small-Area Inequality Estimation

Census Data

Area Attributes
1001 100
1002 200
  • Limited income information
  • Aggregate estimates

Survey Data

Person Area Disp. Inc
1 1001 30000
2 1001 50000
  • Only source with disposable income
  • Few obs per area
  • Under-sampled for certain areas

Administrative Data

Person/Area Taxable Income
1001 60000
1002 50000
  • Similar to Census but with potential sample bias
Unveiling Regional Disparities: Integrating Multiple Data Sources for Small-Area Income Inequality Estimates

Unified Data Integration Models

  • Treat survey, census, and tax data as different samples in a "super" dataset with missing values
  • Income distribution is modelled as a mixture distribution
    • Spatially correlated income generation opportunities ("clusters") defined by the area attributes, incorporating Census and other data
    • Income distribution from each cluster is defined by the population characteristics and the area they live in, incorporating taxation data
    • Survey can be considered as a sample of selected area, which is used to estimate the income distribution parameters
Unveiling Regional Disparities: Integrating Multiple Data Sources for Small-Area Income Inequality Estimates

Theoretical and Empirical Consistency

  • Aligns with microeconomic foundation on income generation
  • Local administrative data helps to ground local estimates to reliable values
  • Exploit spatial correlations in the underlying income opportunities (rather than arbitrary spatial smoothing on income or inequality)
  • Links individual income to small area, to national income distribution
Unveiling Regional Disparities: Integrating Multiple Data Sources for Small-Area Income Inequality Estimates

Methodology Overview

The mixture model framework

  • (income) comes from survey data
  • , (area attributes) come from census, tax and weather data at area level
  • is the income distribution for the -th component based on a Singh-Maddala Distribution
  • are the parameters of the income distribution
Unveiling Regional Disparities: Integrating Multiple Data Sources for Small-Area Income Inequality Estimates

Methodology Overview

The mixture model framework

  • are the mixing proportions
  • is the number of mixture components
  • Survey data doesn't need to cover all areas as long as Census and Tax data cover all areas
  • Captures multi-modal income patterns
Unveiling Regional Disparities: Integrating Multiple Data Sources for Small-Area Income Inequality Estimates

Advantages

  • Respects the overall spatial correlation in income generation opportunities and income level
  • Takes into account the spatial heterogeneity and demographic variations, including abrupt shifts
  • Avoids strong parametric assumption on income distribution shape or modality
  • Referencing administrative data
  • Can be extended to arbitrary small area level with local administrative data
  • Uncertainties quantified as s.e. in the estimation
Unveiling Regional Disparities: Integrating Multiple Data Sources for Small-Area Income Inequality Estimates

Data Integration

Census Data and Weather Data

  • Informs mixing proportions
  • Demographic factors
  • Geographic variables (used in RBF)
  • Rainfall data

Survey Data

  • Distribution parameters
  • Income information
  • Household characteristics

Tax Records

  • Anchors estimates of the mean income level
  • Accurate small-area data for selected population groups
  • Administrative validation

Area standardised to SA3 level for this particular analysis

Unveiling Regional Disparities: Integrating Multiple Data Sources for Small-Area Income Inequality Estimates

Results - National Estimates

Unveiling Regional Disparities: Integrating Multiple Data Sources for Small-Area Income Inequality Estimates

Results

Gini distribution shift

Unveiling Regional Disparities: Integrating Multiple Data Sources for Small-Area Income Inequality Estimates

Results

Year ABS (SIH) HILDA Aggregated from New Model SAE*
2013 0.333 0.301 0.303
2014 0.296 0.300
2015 0.323 0.296 0.299
2016 0.304 0.305
2017 0.328 0.300 0.301
2018 0.305 0.305
2019 0.324 0.289 0.288
2020 0.296 0.295
2021 0.322 0.321

* includes areas not sampled by ABS SIH HILDA but in Census

Unveiling Regional Disparities: Integrating Multiple Data Sources for Small-Area Income Inequality Estimates

Results - Clusters

Number of clusters (income generation opportunities)

Sensitivity analysis on the number of clusters

Unveiling Regional Disparities: Integrating Multiple Data Sources for Small-Area Income Inequality Estimates

Results

Correlation between Area Characteristics and Cluster Membership

Cluster 1 2 3 4 5 6
Prop Age 65+ 0.241 0.037 -0.467 -0.230 0.004 0.049
Prop Age <15 0.125 0.085 -0.122 0.425 -0.292 -0.066
Prop Benefits (DSS data) 0.708 0.200 -0.419 0.381 -0.533 -0.219
Prop Employed -0.699 -0.122 0.408 -0.548 0.486 0.162
Taxable Income -0.511 -0.194 0.250 -0.356 0.614 0.212
Capital Region -0.492 -0.168 0.068 0.039 0.174 0.246
Population in 2021 ('000) 2040.146 2014.519 2000.428 194.662 1078.213 17928.686
Unveiling Regional Disparities: Integrating Multiple Data Sources for Small-Area Income Inequality Estimates

Results: Cluster Analysis

Six distinct clusters identified:
  1. Welfare dependent population
  2. Lower-income/Rural Population
  3. Employed with less dependents
  4. Low-Income with Children
  5. High-Income
  6. General Population (Rest)

Unveiling Regional Disparities: Integrating Multiple Data Sources for Small-Area Income Inequality Estimates

Temporal Trends

  • Relatively stable until COVID period
  • Impact of socio-economic events
  • Regional convergence/divergence

Unveiling Regional Disparities: Integrating Multiple Data Sources for Small-Area Income Inequality Estimates

Results

Area Characteristics Correlation with Gini
Prop Age 65+ 0.14
Prop Age <15 -0.61
Prop Benefits -0.47
Prop Employed 0.51
Taxable Income 0.74
In Greater Capital Region 0.03
Population Size (2021) -0.10
Unveiling Regional Disparities: Integrating Multiple Data Sources for Small-Area Income Inequality Estimates

Results

Areas with highest income inequality in 2021

Rank SA3 Name Region Gini Avg Taxable Inc
1 Cottesloe - Claremont Greater Perth 0.413 170003
2 Perth City Greater Perth 0.397 97562
3 Stonnington - West Greater Melbourne 0.394 152764
4 Molonglo ACT 0.387 88053
5 Brisbane Inner Greater Brisbane 0.382 90497
Unveiling Regional Disparities: Integrating Multiple Data Sources for Small-Area Income Inequality Estimates

Results

Areas with lowest income inequality in 2021

Rank SA3 Name Region Gini Avg Taxable Inc
1 Springfield - Redbank Greater Brisbane 0.253 60196
2 Casey - South Greater Melbourne 0.257 58780
3 North Lakes Greater Brisbane 0.258 64935
4 Rocklea - Acacia Ridge Greater Brisbane 0.261 59105
5 Browns Plains Greater Brisbane 0.262 54273
Unveiling Regional Disparities: Integrating Multiple Data Sources for Small-Area Income Inequality Estimates

Further Research

  • Identify and isolate the effect of different income sources and the inequality of opportunities
  • Can extend to arbitrary small area level with local administrative data (e.g. SA1)
  • Potential extension to other metrics of interests (e.g. wellbeing, health conditions, wealth, extended poverty etc.)
  • International comparisons
Unveiling Regional Disparities: Integrating Multiple Data Sources for Small-Area Income Inequality Estimates

Thank You

Questions?

Contact Information:
Jinjing Li
Email: [email protected]

![bg left:33%](images/sunder-unsplash.jpg)

![bg left:30%](images/annie-spratt-unsplash.jpg)

![bg left:30%](images/shubham-dhage.jpg)

--- # Spatial and Temporal Modeling ## Temporal Evolution - Year dummy variables - Captures structural changes - Flexible time trends ## Spatial Effects - RBF interpolation - Exponential kernel function - Reference points based on capital areas

--- # Spatial Patterns

- Significant variation across regions - Urban-rural differences - Capital city effects

@import "[TOC]" {cmd="toc" depthFrom=1 depthTo=6 orderedList=false}