作业概述
这个作业属于哪个课程 | 软件工程 |
---|---|
这个作业要求在哪里 | 个人项目 |
这个作业的目标 | 完成论文查重程序 |
Github
https://github.com/lianghongjun/3121004956
PSP表格
PSP2.1 | Personal Software Process Stages | 预估耗时(分钟) | 实际耗时(分钟) |
---|---|---|---|
Planning | 计划 | 60 | 60 |
Estimate | 估计这个任务需要多少时间 | 60 | 60 |
Development | 开发 | 600 | 700 |
Analysis | 需求分析 | 200 | 300 |
Design Spec | 生成设计文档 | 30 | 60 |
Design Review | 设计复审 | 10 | 20 |
Coding Standard | 代码规范 | 30 | 40 |
Design | 具体设计 | 60 | 80 |
Coding | 具体编码 | 300 | 500 |
Code Review | 代码复审 | 30 | 60 |
Test | 测试 | 60 | 90 |
Reporting | 报告 | 300 | 240 |
Test Repor | 测试报告 | 30 | 60 |
Size Measurement | 计算工作量 | 60 | 60 |
Postmortem & Process Improvement Plan | 事后总结,并提出过程改进计划 | 60 | 60 |
合计 | 1890 | 2270 |
接口的设计与实现过程
1.1 整体流程
1.2 类
- Main:主方法
- 从命令行输入的路径读取文件,将文件内容转化为字符串
2.求字符串对应的simHash值 - 由simHash值求出相似度(即论文查重率)
- 把查重率写入最后的结果文件中
- 退出
-
TxtUtils:读写txt文件
1、readTxt:读txt文件
2、writeTxt:写txt文件
-
SimHashUtils:计算SimHash值
1、getHash:计算出字符串的hash值,并返回它的字符串类型
2、getSimHash:计算出字符串的simHash值,并返回它的字符串类型
-
HammingUtils:计算海明距离
1、getHammingDistance:输入两个simHash值,计算它们的海明距离
2、getSimilarity:输入两个simHash值,输出相似度
1.3 核心算法
SimHash + 海明距离
接口性能改进
单元测试
- TxtUtilsTest
点击查看代码
public class TxtUtilsTest {
@Test
public void readTxtTest() {
// 路径存在,读文件成功
String str = TxtUtils.readTxt("D:/test/orig.txt");
String[] strings = str.split(" ");
for (String string : strings) {
System.out.println(string);
}
}
@Test
public void readTxtFailTest() {
// 路径不存在,读文件失败
TxtUtils.readTxt("D:/test/123.txt");
}
@Test
public void writeTxtTest() {
// 路径存在,写答案文件成功
double[] elem = {0.11, 0.22, 0.33, 0.44, 0.55};
for (double v : elem) {
TxtUtils.writeTxt(v, "D:/test/result.txt");
}
}
@Test
public void writeTxtFailTest() {
// 路径有误,写入答案文件失败
double[] elem = {0.11, 0.22, 0.33, 0.44, 0.55};
for (double v : elem) {
TxtUtils.writeTxt(v, "abc:/test/result.txt");
}
}
}

- SimHashUtilsTest
点击查看代码
public class SimHashUtilsTest {
@Test
public void getHashTest(){
String[] strings = {"Hello", "Hi", "Good", "Bad"};
for (String string : strings) {
String hash = SimHashUtils.getHash(string);
System.out.println(hash.length());
System.out.println(hash);
}
}
@Test
public void getSimHashTest(){
String origTxt = TxtUtils.readTxt("D:/test/orig.txt");
String copyTxt = TxtUtils.readTxt("D:/test/orig_0.8_add.txt");
System.out.println(SimHashUtils.getSimHash(origTxt));
System.out.println(SimHashUtils.getSimHash(copyTxt));
}
}

- HammingUtilsTest
点击查看代码
public class HammingUtilsTest {
@Test
public void getHammingDistanceTest() {
String str0 = TxtUtils.readTxt("D:/test/orig.txt");
String str1 = TxtUtils.readTxt("D:/test/orig_0.8_add.txt");
int distance = HammingUtils.getHammingDistance(Objects.requireNonNull(SimHashUtils.getSimHash(str0)),
Objects.requireNonNull(SimHashUtils.getSimHash(str1)));
System.out.println("海明距离: " + distance);
System.out.println("相似度: " + (100 - distance * 100 / 128) + "%");
}
@Test
public void getHammingDistanceFailTest() {
// 测试长度不同的情况
String str0 = "10101010101";
String str1 = "10101010";
System.out.println(HammingUtils.getHammingDistance(str0, str1));
}
@Test
public void getSimilarityTest() {
String origTxt = TxtUtils.readTxt("D:/test/orig.txt");
String copyTxt = TxtUtils.readTxt("D:/test/orig_0.8_add.txt");
int distance = HammingUtils.getHammingDistance(Objects.requireNonNull(SimHashUtils.getSimHash(origTxt)),
Objects.requireNonNull(SimHashUtils.getSimHash(copyTxt)));
double similarity = HammingUtils.getSimilarity(SimHashUtils.getSimHash(origTxt), SimHashUtils.getSimHash(copyTxt));
System.out.println("orig和copy的海明距离: " + distance);
System.out.println("orig和copy的相似度:" + similarity);
}
}

- ContentSHortExceptionTest
点击查看代码
public class ContentShortExceptionTest {
@Test
public void shortStringExceptionTest(){
//长度小于200
System.out.println(SimHashUtils.getSimHash("练习时长两年半"));
}
}

- MainTest
点击查看代码
public class MainTest {
@Test
public void origTest(){
String orig = TxtUtils.readTxt("D:/test/orig.txt");
String copy = TxtUtils.readTxt("D:/test/orig.txt");
String resultFileName = "D:/test/resultOrigTest.txt";
double result = HammingUtils.getSimilarity(SimHashUtils.getSimHash(orig), SimHashUtils.getSimHash(copy));
TxtUtils.writeTxt(result, resultFileName);
}
@Test
public void addTest(){
String orig = TxtUtils.readTxt("D:/test/orig.txt");
String copy = TxtUtils.readTxt("D:/test/orig_0.8_add.txt");
String resultFileName = "D:/test/resultAddTest.txt";
double result = HammingUtils.getSimilarity(SimHashUtils.getSimHash(orig), SimHashUtils.getSimHash(copy));
TxtUtils.writeTxt(result, resultFileName);
}
@Test
public void delTest(){
String orig = TxtUtils.readTxt("D:/test/orig.txt");
String copy = TxtUtils.readTxt("D:/test/orig_0.8_del.txt");
String resultFileName = "D:/test/resultDelTest.txt";
double result = HammingUtils.getSimilarity(SimHashUtils.getSimHash(orig), SimHashUtils.getSimHash(copy));
TxtUtils.writeTxt(result, resultFileName);
}
@Test
public void dis1Test(){
String orig = TxtUtils.readTxt("D:/test/orig.txt");
String copy = TxtUtils.readTxt("D:/test/orig_0.8_dis_1.txt");
String resultFileName = "D:/test/resultDis1Test.txt";
double result = HammingUtils.getSimilarity(SimHashUtils.getSimHash(orig), SimHashUtils.getSimHash(copy));
TxtUtils.writeTxt(result, resultFileName);
}
@Test
public void dDis10Test(){
String orig = TxtUtils.readTxt("D:/test/orig.txt");
String copy = TxtUtils.readTxt("D:/test/orig_0.8_dis_10.txt");
String resultFileName = "D:/test/resultDis10Test.txt";
double result = HammingUtils.getSimilarity(SimHashUtils.getSimHash(orig), SimHashUtils.getSimHash(copy));
TxtUtils.writeTxt(result, resultFileName);
}
@Test
public void dis15Test(){
String orig = TxtUtils.readTxt("D:/test/orig.txt");
String copy = TxtUtils.readTxt("D:/test/orig_0.8_dis_15.txt");
String resultFileName = "D:/test/resultDis15Test.txt";
double result = HammingUtils.getSimilarity(SimHashUtils.getSimHash(orig), SimHashUtils.getSimHash(copy));
TxtUtils.writeTxt(result,resultFileName);
}
@Test
public void allTest(){
String[] str = new String[6];
str[0] = TxtUtils.readTxt("D:/test/orig.txt");
str[1] = TxtUtils.readTxt("D:/test/orig_0.8_add.txt");
str[2] = TxtUtils.readTxt("D:/test/orig_0.8_del.txt");
str[3] = TxtUtils.readTxt("D:/test/orig_0.8_dis_1.txt");
str[4] = TxtUtils.readTxt("D:/test/orig_0.8_dis_10.txt");
str[5] = TxtUtils.readTxt("D:/test/orig_0.8_dis_15.txt");
String resultFileName = "D:/test/resultAll.txt";
for(int i = 0; i <= 5; i++){
double result = HammingUtils.getSimilarity(SimHashUtils.getSimHash(str[0]), SimHashUtils.getSimHash(str[i]));
TxtUtils.writeTxt(result, resultFileName);
}
}
}


异常处理
- ContentShortException
处理文本过短
