Hierarchical clustering is a very intuitive algorithm. Hierarchical clustering is to perform clustering layer by layer. Small clusters can be merged and gathered from bottom to top, or large clusters can be divided from top to bottom. It seems that the most commonly used is to gather from the bottom up. The so-called merging clusters from the bottom up, specifically, is to find the two clusters with the shortest distance each time, and then merge them into a large cluster until they are all merged into one cluster. The whole process is to build a tree structure.
Figure 1. Hierarchical clustering.
How to judge the distance between two clusters? At the beginning, each data point is a class by itself, and their distance is the distance between these two points. For clusters containing more than one data point, multiple methods can be selected. The most commonly used is average-linkage, which calculates the average of the pairwise distances between the respective data points of two clusters. Similarly, there is single-linkage/complete-linkage, which selects the distance between the shortest/longest pair of data points in the two clusters as the class distance.
Figure 2. Hierarchical clustering by average-linkage method
CD ComputaBio calculates the similarity between nodes through a certain similarity measure, and sorts them from high to low, and gradually reconnects the nodes. The advantage of this method is that the division can be stopped at any time. Customers can provide distance matrix or raw data to perform hierarchical clustering. After providing the original data, we will automatically calculate the distance matrix. The main steps are as follows:
|Hierarchical Clustering Service
The greater advantage of hierarchical clustering is that it obtains the entire clustering process at one time. As long as the cluster tree like above is obtained, you can get the result directly according to the tree structure and change the number of clusters. No need to calculate the attribution of the data point again.
|Depends on the time you need to simulate and the time required for the system to reach equilibrium.
|Product delivery mode
|The simulation results provide you with the raw data and analysis results of molecular dynamics.
CD ComputaBio provides corresponding professional hierarchical clustering service. Our hierarchical clustering service has proven to be very useful for understanding the biochemical basis of physiological events at different stages of drug development (even in different fields such as materials science). The CD ComputaBio team has worked in this field for more than a decade and published his findings in top scientific journals. If you need network analysis services, please feel free to contact us.
Tech Trends in Artificial Intelligence What's up, what's down, what lies ahead, only found in MedAI!