软工作业2:个人项目

作业概述

这个作业属于哪个课程 软件工程
这个作业要求在哪里 个人项目
这个作业的目标 完成论文查重程序

Github

https://github.com/lianghongjun/3121004956

PSP表格

PSP2.1 Personal Software Process Stages 预估耗时(分钟) 实际耗时(分钟)
Planning 计划 60 60
Estimate 估计这个任务需要多少时间 60 60
Development 开发 600 700
Analysis 需求分析 200 300
Design Spec 生成设计文档 30 60
Design Review 设计复审 10 20
Coding Standard 代码规范 30 40
Design 具体设计 60 80
Coding 具体编码 300 500
Code Review 代码复审 30 60
Test 测试 60 90
Reporting 报告 300 240
Test Repor 测试报告 30 60
Size Measurement 计算工作量 60 60
Postmortem & Process Improvement Plan 事后总结,并提出过程改进计划 60 60
合计 1890 2270

接口的设计与实现过程

1.1 整体流程

1.2 类

  • Main:主方法
  1. 从命令行输入的路径读取文件,将文件内容转化为字符串
    2.求字符串对应的simHash值
  2. 由simHash值求出相似度(即论文查重率)
  3. 把查重率写入最后的结果文件中
  4. 退出
  • TxtUtils:读写txt文件

    1、readTxt:读txt文件

    2、writeTxt:写txt文件

  • SimHashUtils:计算SimHash值

    1、getHash:计算出字符串的hash值,并返回它的字符串类型

    2、getSimHash:计算出字符串的simHash值,并返回它的字符串类型

  • HammingUtils:计算海明距离

    1、getHammingDistance:输入两个simHash值,计算它们的海明距离

    2、getSimilarity:输入两个simHash值,输出相似度

1.3 核心算法

SimHash + 海明距离

接口性能改进


单元测试

  • TxtUtilsTest

点击查看代码

复制代码
public class TxtUtilsTest {

    @Test
    public void readTxtTest() {
        // 路径存在,读文件成功
        String str = TxtUtils.readTxt("D:/test/orig.txt");
        String[] strings = str.split(" ");
        for (String string : strings) {
            System.out.println(string);
        }
    }

    @Test
    public void readTxtFailTest() {
        // 路径不存在,读文件失败
        TxtUtils.readTxt("D:/test/123.txt");
    }

    @Test
    public void writeTxtTest() {
        // 路径存在,写答案文件成功
        double[] elem = {0.11, 0.22, 0.33, 0.44, 0.55};
        for (double v : elem) {
            TxtUtils.writeTxt(v, "D:/test/result.txt");
        }
    }

    @Test
    public void writeTxtFailTest() {
        // 路径有误,写入答案文件失败
        double[] elem = {0.11, 0.22, 0.33, 0.44, 0.55};
        for (double v : elem) {
            TxtUtils.writeTxt(v, "abc:/test/result.txt");
        }
    }

}
  • SimHashUtilsTest

点击查看代码

复制代码
public class SimHashUtilsTest {

    @Test
    public void getHashTest(){
        String[] strings = {"Hello", "Hi", "Good", "Bad"};
        for (String string : strings) {
            String hash = SimHashUtils.getHash(string);
            System.out.println(hash.length());
            System.out.println(hash);
        }
    }
    @Test
    public void getSimHashTest(){
        String origTxt = TxtUtils.readTxt("D:/test/orig.txt");
        String copyTxt = TxtUtils.readTxt("D:/test/orig_0.8_add.txt");
        System.out.println(SimHashUtils.getSimHash(origTxt));
        System.out.println(SimHashUtils.getSimHash(copyTxt));
    }

}
  • HammingUtilsTest

点击查看代码

复制代码
public class HammingUtilsTest {

    @Test
    public void getHammingDistanceTest() {
        String str0 = TxtUtils.readTxt("D:/test/orig.txt");
        String str1 = TxtUtils.readTxt("D:/test/orig_0.8_add.txt");

        int distance = HammingUtils.getHammingDistance(Objects.requireNonNull(SimHashUtils.getSimHash(str0)),
                Objects.requireNonNull(SimHashUtils.getSimHash(str1)));

        System.out.println("海明距离: " + distance);
        System.out.println("相似度: " + (100 - distance * 100 / 128) + "%");
    }

    @Test
    public void getHammingDistanceFailTest() {
        // 测试长度不同的情况
        String str0 = "10101010101";
        String str1 = "10101010";
        System.out.println(HammingUtils.getHammingDistance(str0, str1));
    }

    @Test
    public void getSimilarityTest() {
        String origTxt = TxtUtils.readTxt("D:/test/orig.txt");
        String copyTxt = TxtUtils.readTxt("D:/test/orig_0.8_add.txt");

        int distance = HammingUtils.getHammingDistance(Objects.requireNonNull(SimHashUtils.getSimHash(origTxt)),
                Objects.requireNonNull(SimHashUtils.getSimHash(copyTxt)));

        double similarity = HammingUtils.getSimilarity(SimHashUtils.getSimHash(origTxt), SimHashUtils.getSimHash(copyTxt));

        System.out.println("orig和copy的海明距离: " + distance);
        System.out.println("orig和copy的相似度:" + similarity);
    }
}
  • ContentSHortExceptionTest

点击查看代码

复制代码
public class ContentShortExceptionTest {

    @Test
    public void shortStringExceptionTest(){
        //长度小于200
        System.out.println(SimHashUtils.getSimHash("练习时长两年半"));
    }
}
  • MainTest

点击查看代码

复制代码
public class MainTest {



    @Test
    public void origTest(){
        String orig = TxtUtils.readTxt("D:/test/orig.txt");
        String copy = TxtUtils.readTxt("D:/test/orig.txt");
        String resultFileName = "D:/test/resultOrigTest.txt";
        double result = HammingUtils.getSimilarity(SimHashUtils.getSimHash(orig), SimHashUtils.getSimHash(copy));
        TxtUtils.writeTxt(result, resultFileName);
    }

    @Test
    public void addTest(){
        String orig = TxtUtils.readTxt("D:/test/orig.txt");
        String copy = TxtUtils.readTxt("D:/test/orig_0.8_add.txt");
        String resultFileName = "D:/test/resultAddTest.txt";
        double result = HammingUtils.getSimilarity(SimHashUtils.getSimHash(orig), SimHashUtils.getSimHash(copy));
        TxtUtils.writeTxt(result, resultFileName);
    }

    @Test
    public void delTest(){
        String orig = TxtUtils.readTxt("D:/test/orig.txt");
        String copy = TxtUtils.readTxt("D:/test/orig_0.8_del.txt");
        String resultFileName = "D:/test/resultDelTest.txt";
        double result = HammingUtils.getSimilarity(SimHashUtils.getSimHash(orig), SimHashUtils.getSimHash(copy));
        TxtUtils.writeTxt(result, resultFileName);
    }

    @Test
    public void dis1Test(){
        String orig = TxtUtils.readTxt("D:/test/orig.txt");
        String copy = TxtUtils.readTxt("D:/test/orig_0.8_dis_1.txt");
        String resultFileName = "D:/test/resultDis1Test.txt";
        double result = HammingUtils.getSimilarity(SimHashUtils.getSimHash(orig), SimHashUtils.getSimHash(copy));
        TxtUtils.writeTxt(result, resultFileName);
    }

    @Test
    public void dDis10Test(){
        String orig = TxtUtils.readTxt("D:/test/orig.txt");
        String copy = TxtUtils.readTxt("D:/test/orig_0.8_dis_10.txt");
        String resultFileName = "D:/test/resultDis10Test.txt";
        double result = HammingUtils.getSimilarity(SimHashUtils.getSimHash(orig), SimHashUtils.getSimHash(copy));
        TxtUtils.writeTxt(result, resultFileName);
    }

    @Test
    public void dis15Test(){
        String orig = TxtUtils.readTxt("D:/test/orig.txt");
        String copy = TxtUtils.readTxt("D:/test/orig_0.8_dis_15.txt");
        String resultFileName = "D:/test/resultDis15Test.txt";
        double result = HammingUtils.getSimilarity(SimHashUtils.getSimHash(orig), SimHashUtils.getSimHash(copy));
        TxtUtils.writeTxt(result,resultFileName);
    }

    @Test
    public void allTest(){
        String[] str = new String[6];
        str[0] = TxtUtils.readTxt("D:/test/orig.txt");
        str[1] = TxtUtils.readTxt("D:/test/orig_0.8_add.txt");
        str[2] = TxtUtils.readTxt("D:/test/orig_0.8_del.txt");
        str[3] = TxtUtils.readTxt("D:/test/orig_0.8_dis_1.txt");
        str[4] = TxtUtils.readTxt("D:/test/orig_0.8_dis_10.txt");
        str[5] = TxtUtils.readTxt("D:/test/orig_0.8_dis_15.txt");
        String resultFileName = "D:/test/resultAll.txt";
        for(int i = 0; i <= 5; i++){
            double result = HammingUtils.getSimilarity(SimHashUtils.getSimHash(str[0]), SimHashUtils.getSimHash(str[i]));
            TxtUtils.writeTxt(result, resultFileName);
        }
    }
}

异常处理

  • ContentShortException
    处理文本过短