Puerto Rico Rent Analytics

Dashboard Overview

This dashboard provides data-driven insights into the San Juan rental market using machine learning to identify pricing patterns and potential opportunities. Data is collected twice daily from public listings ($550–$3,000 range) across Condado/Miramar, Hato Rey, Santurce, and Viejo San Juan, and analyzed using an ensemble of regression models.

Understanding the Metrics

Great Deals / Top Underpriced Opportunities

Listings with residual < -$200, anomaly_score > 2.0, and pred_std ≤ $100 — significantly below predicted market value with strong statistical confidence and high model consensus (low ensemble disagreement). The stats card count and the deals table use the same thresholds.

Key Terminology

Neighborhood: Geographic area within San Juan (e.g., Condado, Santurce)
Residual: Difference between actual price and predicted price (negative = underpriced)
Anomaly Score: Statistical measure of how unusual a listing is (higher = more unusual)
Prediction Spread (pred_std): Standard deviation across ensemble models — lower = higher confidence, models agree
Median Absolute Error (MDAE): Model accuracy metric — the median prediction error in dollars (robust to outliers)
Days on Market: Time since listing first appeared (active listings only)

How to Use This Dashboard

Identify Value: Check "Top Underpriced Opportunities" for potential deals
Track Trends: Monitor median price changes over time by neighborhood
Assess Market: Review price distribution and inventory levels
Validate Listings: Cross-reference model predictions with actual listings before making decisions

Important: This tool is for analysis and exploration only. Always verify listings independently and consult professionals before making rental decisions.

Initializing analytics...

Median Price Trends Weekly

Price Distribution Current inventory

Inventory by Neighborhood Active listings

Days on Market Active listings

Model Feature Importance Ensemble weights

Price Distribution by Neighborhood Quartile analysis

Price vs Prediction Scatter Green = underpriced | Red = overpriced

Top Underpriced Opportunities Potential market inefficiencies

Listings with residual < -$200, anomaly_score > 2.0, and pred_std ≤ $100 — priced well below model predictions with high statistical confidence and strong model consensus.

Neighborhood:

Status	Location	Beds/Baths	Actual	Predicted	Savings	Score	Spread	Features	Link
Loading...

Other Discounts Moderate savings

Listings with anomaly_score > 2.0 and pred_std ≤ $100 — priced below model predictions but not deep enough for top deals.

Neighborhood:

Location	Beds/Baths	Actual	Predicted	Savings	Score	Spread	Features	Link
Loading...

Disclaimer

Data Sources

Data is collected from publicly available sources and structured inputs using automated extraction and ingestion processes. The dataset represents snapshots captured at specific points in time and may not reflect the most current state of listings or prices.

Data Limitations

The data may contain inaccuracies, missing fields, duplicates, or outdated information originating from source material or automated parsing. Coverage may be uneven across regions, time periods, or listing types. The system does not guarantee correctness, completeness, or timeliness.

Data Processing

Raw data is cleaned, normalized, and transformed prior to modeling. This includes deduplication, handling missing or inconsistent values, and deriving structured features used for analysis. Some information may be simplified or discarded during processing.

Machine Learning Models (ensemble_v2)

Predictions are generated using a weighted ensemble of supervised regression models: Ridge Regression, Random Forest Regression, and Gradient Boosting Regression.

Interpretation and Use of Predictions

Predictions are intended to support analysis and exploration, not to replace human judgment. Outputs represent probabilistic estimates based on historical patterns and should be interpreted as indicative rather than guaranteed. Do not rely on predictions as the sole basis for high-risk decisions.