Skip to content

Processing Layer - Analytics & Machine Learning

Processing Layer: Core Idea & Novel Contributions

What is the Processing Layer?

The Processing Layer transforms standardized data into insights through analytics, machine learning, and complex computations. It’s where raw data becomes actionable intelligence.

Core Responsibilities

1. ESGETC Scoring & Aggregation

Converts individual data points into consolidated dimension scores:

Input: 50 individual metrics
- Employees (10)
- Diversity ratio (0.42)
- Revenue growth (8%)
- ... (47 more metrics)
↓ Standardization & Weighting
Process:
- Normalize each metric (0-100 scale)
- Apply dimension weight (40% for Economic if CEO prioritized)
- Calculate dimension average
- Quality-weight the result
Output: ECONOMIC dimension score = 72

2. Triple Materiality Assessment (3D Scoring)

Evaluates impact through three lenses:

Financial Materiality: Business impact

  • How does this issue affect revenue, costs, profitability?
  • What’s the financial risk if we don’t address it?
  • How much would improvement be worth?

Impact Materiality: Stakeholder effect

  • Who is affected and how severely?
  • Does this connect to UN SDG targets?
  • Alignment with stakeholder priorities?

Systemic Materiality: System-wide influence

  • Does this affect broader systems and others downstream?
  • Are there feedback loops or cascading effects?
  • What’s the leverage for transformation?

Result: 3D position in 8-octant decision space

3. Stakeholder Salience Analysis

Maps stakeholder influence and importance:

Legitimacy (0-100)
|
High Urgency | Low Urgency
Definitive |__________________|
(Power) | Definitive |
| Urgency |
Low Priority |_________|________|_______
| |
Dependent |__________________|
(Marginal) | Dependent |
|
Low Legitimacy

Each stakeholder placed based on:

  • Legitimacy: Legal/moral/culturally acceptable stake
  • Urgency: Time sensitivity of concern
  • Power: Ability to compel attention

Generates stakeholder engagement strategy automatically.

4. Benchmark Comparison

Positions organization against peers:

Your Organization (score: 65) vs. Benchmarks:
Economic Dimension:
- Global Average: 58
- Industry Median: 62
- Top Quartile: 78
Your position: Above average, room for improvement
Recommendation: Focus on top-quartile companies to understand gaps

Enables:

  • Identify strengths and weaknesses relative to peers
  • Set realistic targets
  • Motivate improvement through comparison

5. Machine Learning Pipelines

Multiple ML models working in parallel:

Predictive Models

  • What will ESGETC scores be in 12 months?
  • Which organizations most likely to succeed/fail?
  • Where are anomalies/risks?

Classification Models

  • Categorize entities automatically (business, NGO, university)
  • Assign sector classifications
  • Detect document types and extract metadata

Recommendation Models

  • Which SDG targets most relevant for this org?
  • What actions will have highest impact?
  • Which partners should this org connect with?

Clustering Models

  • Find organizations similar to this one
  • Identify market segments
  • Discover unexpected groupings

Novel Contributions

1. Adaptive Algorithms & Contextualization

Unlike static models, we adapt weighting based on context:

Organization Type: Manufacturing vs. NGO use different weightings

  • Manufacturing: Economic and Environmental most material
  • NGO: Social and Connectedness more important

Geographic Context: Regional priorities matter

  • Sub-Saharan Africa: Social dimension especially critical (poverty, health)
  • Developed economy: Connectedness and Governance higher weight

Sector-Specific: Industry standards inform modeling

  • SASB framework provides sector-specific indicators
  • Different sectors face different material issues
  • Weighting automatically adjusts per sector

Phase of Business: Stage-of-development matters

  • Startup: Innovation and scaling take priority
  • Mature: Efficiency and governance more important
  • Declining: Resilience and stakeholder management critical

Result: Organization sees benchmarks and recommendations tailored to their context, not generic one-size-fits-all.

2. Continuous Learning Engine

The platform learns from outcomes:

Feedback Loop

  • Organization sets target: “Improve social score from 55 to 70”
  • System recommends actions based on ML model
  • Organization executes for 6 months
  • System measures actual outcome (new score: 68)
  • Model updates: “These 3 types of actions worked well, this one didn’t”
  • Next recommendation refined by learning

Over time:

  • Models become more accurate for your sector/region
  • Recommendations improve
  • False positives/negatives decrease
  • Org sees positive trend: “System is getting smarter”

3. Anomaly Detection & Benchmark Engine

Real-time alerting for unusual patterns:

Type 1: Data Anomalies

  • “Revenue dropped 60% overnight” → Likely data error
  • “CO2 emissions up 400% this month” → Real concern or sensor malfunction?
  • System flags for investigation

Type 2: Performance Anomalies

  • “This org’s social score stable while all peers improving” → Why?
  • “Economic dimension crashed unexpectedly” → Early warning
  • “Score improving faster than industry norm” → Potential best practice

Type 3: Predictive Anomalies

  • “Based on trends, this org will fail to meet target” → Intervention opportunity
  • “This org at high risk based on peer failures” → Preventive action
  • “Network showing early signs of disruption” → Systemic risk alert

4. Big Data Pipeline & Supply Chain Analysis

Handles enterprise-scale data:

Scale

  • 1B+ data points processed
  • 100K+ organizations analyzed
  • 1000K+ supply chain relationships mapped
  • Real-time processing of IoT streams

Supply Chain Traceability

  • Map supplier networks 3+ tiers deep
  • Identify critical nodes (single-source-of-supply risks)
  • Assess end-to-end sustainability
  • Find leakage points and inefficiencies

Network Analysis

  • Identify clusters and gaps in supply chain
  • Find substitution opportunities (alternative suppliers)
  • Assess concentration risk
  • Model resilience to disruptions

Technical Architecture

Lambda Architecture (Batch + Real-Time)

Live Data Stream
|
┌──────────────────┴──────────────────┐
↓ ↓
Speed Layer Batch Layer Serving Layer
(Real-time) (Accuracy) (Queries)
| | |
Spark Stream Spark Batch Database
Process in <5s Process nightly Optimized
| | |
└──────────────────┬──────────────────┘
View/Query Results

Processing Jobs

JobFrequencyRuntimePurpose
ESGETC ScoringReal-time<100msCalculate dimension scores
3D MaterialityReal-time<200msTriple lens assessment
BenchmarkingNightly30minCompare to peers
ML PredictionsWeekly2hoursForecast & recommendations
Anomaly DetectionContinuous<5sAlert on issues
Supply ChainOn-demand5-60minMap relationships
Delphi ProcessingPer-round1hourConsensus calculation

Distributed Computing

For large-scale processing:

Master Node
├─ Task Scheduler
└─ Job Coordinator
Worker Nodes (10-100)
├─ Spark Executor
├─ Data Cache (Redis)
└─ ML Model Serving
Distributed Storage
├─ Raw Data Lake (Parquet)
├─ Processed Results (PostgreSQL)
└─ Cache Layer (Redis Cluster)

Maps processing jobs to available resources, scales automatically.


Machine Learning Models

Supervised Learning

Classification

  • Entity type (business, NGO, university, government)
  • Sector assignment (NAICS codes)
  • Risk category (low/medium/high)

Regression

  • Predict next-period ESGETC scores
  • Estimate organization size from indicators
  • Forecast financial impact

Unsupervised Learning

Clustering

  • Find similar organizations
  • Identify market segments
  • Discover new partnership opportunities

Anomaly Detection

  • Isolation forests for data anomalies
  • Statistical process control for performance anomalies
  • Autoencoders for pattern detection

NLP Models

Entity Extraction

  • Identify organization names, locations, sectors from text
  • Extract SDG mentions and sentiment
  • Classify sustainability claims

Classification

  • Categorize documents (annual reports, policies)
  • Assess sustainability commitment level
  • Flag greenwashing red flags

Performance Optimization

Caching Strategy

Query → Check Cache (1ms)
├─ Hit: Return cached result
└─ Miss: Compute (100-1000ms) → Cache (1hour TTL)

Incremental Processing

Instead of reprocessing all 1B data points:

  • Only process new/changed data
  • Update aggregates incrementally
  • Maintain composite score materializations

Result: Score updates in <1 second instead of hours.

Vectorization

Use matrix operations instead of loops:

# Slow: Python loops
result = []
for metric in metrics:
result.append(metric * weight)
# Fast: NumPy vectorization (100x faster)
result = numpy_array * weights

Best Practices

1. Data Quality First

Even the best algorithm fails on poor input data.

2. Interpretable Models

Avoid black-box models when possible. Be able to explain recommendations.

3. Regular Retraining

Models degrade over time. Retrain monthly/quarterly.

4. Monitor for Drift

Check if model predictions match reality. If not, retrain.

5. A/B Test Changes

Before deploying new model/weighting, test on subset.

6. Document Assumptions

Every model uses assumptions. Document them clearly.


Next Steps