Relationship Analysis

Relationship Analysis

In order to explore the important figures in The Observatory Review newspaper and the relationship between them, we customized a chart to show the strength of the relationship between the characters in the data.

Diagram Interpretation

Fig.1. Relationship Diagram for the important persons in 1953

In this picture, the size of the circle around the person’s name represents the weight of the person in a year. The lines connecting different names indicate the strength of their relationsip.

By observing this picture, we can see that “毛澤東”, “艾森豪”, and “杜魯門” have relatively high weights, which means that they were the objects that newspapers paid more attention to in 1953. At the same time, we see that “杜魯門”, “艾森豪”, and “邱吉爾” are very close. It can be inferred that newspapers pay more attention to their activities, which also indirectly shows that the relationship between Britain and the United States is the object of attention this year.

Code

Simply speaking, the function def yield_relationship(input,output,size) can be divided into three parts. First filter out the names of people in the data, sort and store them in the dictionary. Next, count the number of times these characters appear in the same sentence, and add one to the weight once they appear. Finally, the data is stored in excel, and the relationship diagram is generated by reading excel.

def yield_relationship(input_file,output_file,size):
    try:
        # test.txt 是我們需要讀入的繁體文本,如果遇到無法解碼的錯誤,用errors跳過
        fp = open(input_file, encoding='utf-8', errors="ignore")
    except:
        return
    # 姓名字典
    names = {}
    # 關係字典
    relationships = {}
    # 每段內人物關係
    lineNames = []
    
    for line in tqdm(fp):
        line = line.strip('\n')#去除換行
        line = line.strip('[a-zA-Z0-9]')
        poss = pseg.cut(line)# 分詞返回詞性

        # 為新讀取的一段添加人物關係
        lineNames.append([])
        for w in poss:#遍歷每一個
           # 分詞長度小於2 或詞性不為nr時則與影片所需分析人物無關
           if w.flag != "nr" or len(w.word) < 2:
               continue
           lineNames[-1].append(w.word)#當前段存放人物名

           if names.get(w.word) is None:#如果姓名未出現過
               names[w.word] = 0#當前姓名添加進names字典裡
               relationships[w.word] = {}#初始化該姓名關係圖
               # 人物出現次數+1
           names[w.word] += 1

    namelist = []
    names_sort = sorted(names.items(), key = lambda kv:(kv[1], kv[0]),reverse=True)
    names_sort = names_sort[:size]
    name_dict = dict(names_sort)
    # 降序放置姓名
    for name in names_sort:
        namelist.append(name[0])
    
    #分析人物關係
    for line in lineNames:
            for name1 in line:
                for name2 in line:
                    if name1 == name2:
                        continue
                    if relationships[name1].get(name2) is None:
                        # 兩個人物第一次共同出現 初始化次數
                        relationships[name1][name2] = 1
                    else:
                        # 兩個人物共同出現 關係+1
                        relationships[name1][name2] += 1

    # 人物權重(節點)
    with open("./diagram_csv/earth_node.csv", "w", encoding='utf-8') as f:
        writer = csv.writer(f)
        writer.writerow(["Id","Label","Weight"])
        
        for name, times in names.items():
            writer.writerow([name,name,str(times)])

    # 人物關係邊(邊)
    with open("./diagram_csv/earth_edge.csv", "w", encoding='utf-8') as f:
        writer = csv.writer(f)
        writer.writerow(["Source","Target","Weight"])
        
        for name, edge in relationships.items():
            for v, w in edge.items():
                if w > 1:
                   writer.writerow([name,v,str(w)])

    node_graph=pd.read_csv("./diagram_csv/earth_node.csv" ,encoding='utf-8')# 人物權重文件
    node_graph=node_graph.dropna()

    edge_graph=pd.read_csv("./diagram_csv/earth_edge.csv",encoding='utf-8')# 人物關係文件
    edge_graph=edge_graph.dropna()# 去空白

    mpl.rcParams['font.sans-serif'] = ['SimHei']
    mpl.rcParams['axes.unicode_minus'] = False
    G = nx.Graph()  # 繪製個人物之間的親密關係
    node_size = []
    
    for index, row in edge_graph.iterrows():
        if row['Source']  in namelist:
            G.add_node(row['Source'])  # 添加節點 將當前人物添加進節點
            
    for node in G.nodes():
        node_size.append(name_dict[node]*3)
        
    for index, row in edge_graph.iterrows():
        if (row['Source']  in namelist) & (row['Target']  in namelist):
            #G.add_node(row['Source'], row['Target'])
            G.add_weighted_edges_from([(row['Source'], row['Target'], 10*np.array(row['Weight']))])#添加權重

    pos = nx.shell_layout(G)
    # 生成對應節點圖
    nx.draw_networkx(G, pos, with_labels=True, node_color=range(G.number_of_nodes()), edge_color='red', node_size = node_size, font_size = 11, alpha=0.5,width=[float(d['weight']*0.07) for (u,v,d) in G.edges(data=True)])
    plt.savefig(output_file)
    plt.close()