Compare commits
5 Commits
3361ec4e8e
...
main
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
9153124ff7 | ||
|
|
dca1aec8cc | ||
|
|
3b0e5b0e84 | ||
|
|
85f7f3ea8f | ||
|
|
3e50e8d1c5 |
47
README.md
Normal file
@@ -0,0 +1,47 @@
|
|||||||
|
# 大数据与人工智能期中作业: MovieLens 1M 数据集处理与分析
|
||||||
|
|
||||||
|
`matrix_factorization.py` 负责利用矩阵分解方法对数据集进行处理与预测填补。
|
||||||
|
|
||||||
|
`analyzer.py` 负责结合预测填补后的用户-评分矩阵数据和原始数据进行初步分析。
|
||||||
|
|
||||||
|
`dataset` 内是 MovieLens 1M 数据集。
|
||||||
|
|
||||||
|
`result` 内是经过矩阵分解进行处理与预测填补的数据。
|
||||||
|
|
||||||
|
`analysis_results` 内是对以上数据进行初步分析的结果。
|
||||||
|
|
||||||
|
## MovieLens 1M 数据集
|
||||||
|
|
||||||
|
MovieLens 1M 电影评分数据集包含了 6000 名用户对 4000 部电影的 100 万条评分。发布于2003年2月。
|
||||||
|
|
||||||
|
[数据集链接](https://grouplens.org/datasets/movielens/1m/)
|
||||||
|
|
||||||
|
## 数据集处理与预测填补: 矩阵分解
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
## 数据初步分析
|
||||||
|
|
||||||
|
### 用户
|
||||||
|
|
||||||
|
- 年龄
|
||||||
|
- 性别
|
||||||
|
- 职业
|
||||||
|
- 地域分布(邮政编码)
|
||||||
|
|
||||||
|
### 电影
|
||||||
|
|
||||||
|
- 类别
|
||||||
|
- 上映年份
|
||||||
|
|
||||||
|
### 评分数据
|
||||||
|
|
||||||
|
- 评分分布情况
|
||||||
|
- ...
|
||||||
|
|
||||||
|
### 综合分析
|
||||||
|
|
||||||
|
- 基于用户年龄
|
||||||
|
- 基于用户性别
|
||||||
|
- 基于用户职业
|
||||||
|
- ...
|
||||||
@@ -3,7 +3,7 @@
|
|||||||
<html>
|
<html>
|
||||||
<head>
|
<head>
|
||||||
<meta charset="utf-8">
|
<meta charset="utf-8">
|
||||||
<title>MovieLens Dataset Analysis Report</title>
|
<title>MovieLens数据集分析报告</title>
|
||||||
<style>
|
<style>
|
||||||
body { font-family: Arial, sans-serif; margin: 0; padding: 20px; color: #333; }
|
body { font-family: Arial, sans-serif; margin: 0; padding: 20px; color: #333; }
|
||||||
.container { max-width: 1200px; margin: 0 auto; }
|
.container { max-width: 1200px; margin: 0 auto; }
|
||||||
@@ -23,90 +23,90 @@
|
|||||||
</head>
|
</head>
|
||||||
<body>
|
<body>
|
||||||
<div class="container">
|
<div class="container">
|
||||||
<h1>MovieLens Dataset User-Movie Preference Analysis Report</h1>
|
<h1>MovieLens数据集用户-电影偏好分析报告</h1>
|
||||||
|
|
||||||
<div class="summary">
|
<div class="summary">
|
||||||
<h2>Data Overview</h2>
|
<h2>数据概览</h2>
|
||||||
<p>This analysis is based on the MovieLens dataset, containing 6040 users、3883 movies and 1000209 original rating records。</p>
|
<p>本分析基于MovieLens数据集,包含 6040 位用户、3883 部电影 和 1000209 条原始评分记录。</p>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
<h2>User Profile Analysis</h2>
|
<h2>用户基本情况分析</h2>
|
||||||
<div class="figure">
|
<div class="figure">
|
||||||
<img src="user_analysis/gender_distribution.png" alt="用户性别分布">
|
<img src="user_analysis/gender_distribution.png" alt="User Gender Distribution">
|
||||||
<p class="caption">User Gender Distribution</p>
|
<p class="caption">用户性别分布</p>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
<div class="figure">
|
<div class="figure">
|
||||||
<img src="user_analysis/age_distribution.png" alt="用户年龄分布">
|
<img src="user_analysis/age_distribution.png" alt="User Age Distribution">
|
||||||
<p class="caption">User Age Distribution</p>
|
<p class="caption">用户年龄分布</p>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
<div class="figure">
|
<div class="figure">
|
||||||
<img src="user_analysis/occupation_distribution.png" alt="用户职业分布">
|
<img src="user_analysis/occupation_distribution.png" alt="User Occupation Distribution">
|
||||||
<p class="caption">User Occupation Distribution</p>
|
<p class="caption">用户职业分布</p>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
<h2>Movie Distribution Analysis</h2>
|
<h2>电影分布情况分析</h2>
|
||||||
<div class="figure">
|
<div class="figure">
|
||||||
<img src="movie_analysis/genre_distribution.png" alt="电影类型分布">
|
<img src="movie_analysis/genre_distribution.png" alt="Movie Genre Distribution">
|
||||||
<p class="caption">Movie Genre Distribution</p>
|
<p class="caption">电影类型分布</p>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
<div class="figure">
|
<div class="figure">
|
||||||
<img src="movie_analysis/year_distribution.png" alt="电影发行年份分布">
|
<img src="movie_analysis/year_distribution.png" alt="Movie Release Year Distribution">
|
||||||
<p class="caption">Movie Release Year Distribution</p>
|
<p class="caption">电影发行年份分布</p>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
<div class="figure">
|
<div class="figure">
|
||||||
<img src="movie_analysis/most_rated_movies.png" alt="评分数量最多的电影">
|
<img src="movie_analysis/most_rated_movies.png" alt="Top 20 Most Rated Movies">
|
||||||
<p class="caption">Top 20 Most Rated Movies</p>
|
<p class="caption">评分数量最多的20部电影</p>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
<h2>Rating Distribution Analysis</h2>
|
<h2>评分分布情况分析</h2>
|
||||||
<div class="figure">
|
<div class="figure">
|
||||||
<img src="rating_analysis/rating_distribution.png" alt="评分分布">
|
<img src="rating_analysis/rating_distribution.png" alt="Rating Distribution">
|
||||||
<p class="caption">Rating Distribution</p>
|
<p class="caption">评分分布情况</p>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
<div class="figure">
|
<div class="figure">
|
||||||
<img src="rating_analysis/genre_avg_ratings.png" alt="各类型电影的平均评分">
|
<img src="rating_analysis/genre_avg_ratings.png" alt="Average Rating by Movie Genre">
|
||||||
<p class="caption">Average Rating by Movie Genre</p>
|
<p class="caption">各类型电影的平均评分</p>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
<div class="figure">
|
<div class="figure">
|
||||||
<img src="rating_analysis/top_rated_movies.png" alt="评分最高的电影">
|
<img src="rating_analysis/top_rated_movies.png" alt="Top 20 Highest Rated Movies">
|
||||||
<p class="caption">Top 20 Highest Rated Movies (min. 100 ratings)</p>
|
<p class="caption">评分最高的20部电影(至少有100个评分)</p>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
<h2>User Characteristics and Movie Preferences</h2>
|
<h2>用户特征与电影偏好分析</h2>
|
||||||
<div class="figure">
|
<div class="figure">
|
||||||
<img src="preference_analysis/gender_genre_heatmap.png" alt="不同性别的电影类型偏好">
|
<img src="preference_analysis/gender_genre_heatmap.png" alt="Movie Genre Preferences by Gender">
|
||||||
<p class="caption">Movie Genre Preferences by Gender</p>
|
<p class="caption">不同性别的电影类型偏好对比</p>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
<div class="figure">
|
<div class="figure">
|
||||||
<img src="preference_analysis/age_genre_heatmap.png" alt="不同年龄段的电影类型偏好">
|
<img src="preference_analysis/age_genre_heatmap.png" alt="Movie Genre Preferences by Age Group">
|
||||||
<p class="caption">Movie Genre Preferences by Age Group</p>
|
<p class="caption">不同年龄段的电影类型偏好对比</p>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
<div class="figure">
|
<div class="figure">
|
||||||
<img src="preference_analysis/age_year_heatmap.png" alt="不同年龄段对不同年代电影的偏好">
|
<img src="preference_analysis/age_year_heatmap.png" alt="Preferences for Movies by Decade Across Age Groups">
|
||||||
<p class="caption">Preferences for Movies by Decade Across Age Groups</p>
|
<p class="caption">不同年龄段对不同年代电影的偏好</p>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
<div class="figure">
|
<div class="figure">
|
||||||
<img src="preference_analysis/gender_avg_rating.png" alt="性别与平均评分的关系">
|
<img src="preference_analysis/gender_avg_rating.png" alt="Average Rating by Gender">
|
||||||
<p class="caption">Relationship Between Gender and Average Rating</p>
|
<p class="caption">性别与平均评分的关系</p>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
<h2>Conclusions and Insights</h2>
|
<h2>结论与洞察</h2>
|
||||||
<p>Through in-depth analysis of the MovieLens dataset, we found significant correlations between user characteristics (gender, age, occupation) and movie preferences. Key findings include:</p>
|
<p>通过对MovieLens数据集的深入分析,我们发现了用户特征(如性别、年龄、职业)与电影偏好之间存在显著关联。主要结论包括:</p>
|
||||||
<ul>
|
<ul>
|
||||||
<li>Significant differences in movie genre preferences between genders</li>
|
<li>不同性别用户在电影类型偏好上存在明显差异</li>
|
||||||
<li>Age influences how users rate movies from different decades</li>
|
<li>年龄因素会影响用户对不同年代电影的评价</li>
|
||||||
<li>Occupational background correlates with genre preferences</li>
|
<li>职业背景与电影类型偏好具有相关性</li>
|
||||||
</ul>
|
</ul>
|
||||||
<p>These findings provide valuable reference for designing movie recommendation systems and developing movie marketing strategies.</p>
|
<p>这些发现对于电影推荐系统的设计和电影营销策略制定具有重要参考价值。</p>
|
||||||
</div>
|
</div>
|
||||||
</body>
|
</body>
|
||||||
</html>
|
</html>
|
||||||
|
|||||||
@@ -1,28 +1,28 @@
|
|||||||
{
|
{
|
||||||
"Data Overview": {
|
"数据概览": {
|
||||||
"Number of Users": 6040,
|
"用户数量": 6040,
|
||||||
"Number of Movies": 3883,
|
"电影数量": 3883,
|
||||||
"Original Ratings Count": 1000209,
|
"原始评分数量": 1000209,
|
||||||
"Filled Ratings Count": 22384240
|
"填补后评分数量": 22384240
|
||||||
},
|
},
|
||||||
"User Analysis": {
|
"用户分析": {
|
||||||
"Gender Distribution": {
|
"性别分布": {
|
||||||
"M": 4331,
|
"M": 4331,
|
||||||
"F": 1709
|
"F": 1709
|
||||||
},
|
},
|
||||||
"Age Distribution": {
|
"年龄分布": {
|
||||||
"25-34": 2096,
|
"25-34": 2096,
|
||||||
"35-44": 1193,
|
"35-44": 1193,
|
||||||
"18-24": 1103,
|
"18-24": 1103,
|
||||||
"45-49": 550,
|
"45-49": 550,
|
||||||
"50-55": 496,
|
"50-55": 496,
|
||||||
"56+": 380,
|
"56岁以上": 380,
|
||||||
"Under 18": 222
|
"18岁以下": 222
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
"Rating Analysis": {
|
"评分分析": {
|
||||||
"Average Rating": 3.58,
|
"平均评分": 3.58,
|
||||||
"Rating Distribution": {
|
"评分分布": {
|
||||||
"1": 56174,
|
"1": 56174,
|
||||||
"2": 107557,
|
"2": 107557,
|
||||||
"3": 261197,
|
"3": 261197,
|
||||||
|
|||||||
|
Before Width: | Height: | Size: 37 KiB After Width: | Height: | Size: 36 KiB |
|
Before Width: | Height: | Size: 35 KiB After Width: | Height: | Size: 32 KiB |
|
Before Width: | Height: | Size: 77 KiB After Width: | Height: | Size: 69 KiB |
|
Before Width: | Height: | Size: 32 KiB After Width: | Height: | Size: 29 KiB |
|
Before Width: | Height: | Size: 86 KiB After Width: | Height: | Size: 84 KiB |
|
Before Width: | Height: | Size: 28 KiB After Width: | Height: | Size: 27 KiB |
|
Before Width: | Height: | Size: 23 KiB After Width: | Height: | Size: 21 KiB |
BIN
analysis_results/preference_analysis/age_18岁以下_preferences.png
Normal file
|
After Width: | Height: | Size: 22 KiB |
|
Before Width: | Height: | Size: 24 KiB After Width: | Height: | Size: 21 KiB |
|
Before Width: | Height: | Size: 24 KiB After Width: | Height: | Size: 21 KiB |
|
Before Width: | Height: | Size: 23 KiB After Width: | Height: | Size: 21 KiB |
|
Before Width: | Height: | Size: 23 KiB After Width: | Height: | Size: 21 KiB |
|
Before Width: | Height: | Size: 23 KiB |
BIN
analysis_results/preference_analysis/age_56岁以上_preferences.png
Normal file
|
After Width: | Height: | Size: 22 KiB |
|
Before Width: | Height: | Size: 23 KiB |
|
Before Width: | Height: | Size: 36 KiB After Width: | Height: | Size: 36 KiB |
|
Before Width: | Height: | Size: 127 KiB After Width: | Height: | Size: 116 KiB |
|
Before Width: | Height: | Size: 39 KiB After Width: | Height: | Size: 37 KiB |
|
Before Width: | Height: | Size: 82 KiB After Width: | Height: | Size: 74 KiB |
|
Before Width: | Height: | Size: 334 KiB After Width: | Height: | Size: 335 KiB |
|
Before Width: | Height: | Size: 31 KiB After Width: | Height: | Size: 27 KiB |
|
Before Width: | Height: | Size: 30 KiB After Width: | Height: | Size: 27 KiB |
|
Before Width: | Height: | Size: 24 KiB After Width: | Height: | Size: 23 KiB |
|
Before Width: | Height: | Size: 61 KiB After Width: | Height: | Size: 57 KiB |
|
Before Width: | Height: | Size: 24 KiB After Width: | Height: | Size: 23 KiB |
|
Before Width: | Height: | Size: 23 KiB |
|
Before Width: | Height: | Size: 22 KiB |
|
Before Width: | Height: | Size: 23 KiB |
|
Before Width: | Height: | Size: 23 KiB |
|
Before Width: | Height: | Size: 22 KiB |
|
Before Width: | Height: | Size: 22 KiB |
|
Before Width: | Height: | Size: 22 KiB |
|
Before Width: | Height: | Size: 66 KiB After Width: | Height: | Size: 59 KiB |
|
Before Width: | Height: | Size: 128 KiB After Width: | Height: | Size: 118 KiB |
|
After Width: | Height: | Size: 22 KiB |
|
After Width: | Height: | Size: 20 KiB |
|
After Width: | Height: | Size: 22 KiB |
|
After Width: | Height: | Size: 20 KiB |
|
After Width: | Height: | Size: 21 KiB |
|
After Width: | Height: | Size: 21 KiB |
|
After Width: | Height: | Size: 21 KiB |
|
Before Width: | Height: | Size: 45 KiB After Width: | Height: | Size: 44 KiB |
|
Before Width: | Height: | Size: 26 KiB After Width: | Height: | Size: 26 KiB |
|
Before Width: | Height: | Size: 19 KiB After Width: | Height: | Size: 18 KiB |
|
Before Width: | Height: | Size: 76 KiB After Width: | Height: | Size: 75 KiB |
|
Before Width: | Height: | Size: 106 KiB After Width: | Height: | Size: 100 KiB |
|
Before Width: | Height: | Size: 129 KiB After Width: | Height: | Size: 125 KiB |
|
Before Width: | Height: | Size: 41 KiB After Width: | Height: | Size: 42 KiB |
|
Before Width: | Height: | Size: 25 KiB After Width: | Height: | Size: 24 KiB |
|
Before Width: | Height: | Size: 21 KiB After Width: | Height: | Size: 22 KiB |
|
Before Width: | Height: | Size: 23 KiB After Width: | Height: | Size: 23 KiB |
|
Before Width: | Height: | Size: 20 KiB After Width: | Height: | Size: 19 KiB |
|
Before Width: | Height: | Size: 82 KiB After Width: | Height: | Size: 77 KiB |
|
Before Width: | Height: | Size: 49 KiB After Width: | Height: | Size: 46 KiB |
|
Before Width: | Height: | Size: 29 KiB After Width: | Height: | Size: 27 KiB |
|
Before Width: | Height: | Size: 34 KiB After Width: | Height: | Size: 30 KiB |