poi解析Excel(三) XML定义解析规则

words: 2k    views:    time: 10min

相对于普通文本,Excel可以将表格数据描述得更结构化,它可以用不同的sheet存放不同的数据,并且在sheet中可指定各个字段的列头,在单元格中可以设置数据类型或格式校验。如果解析时能将这些结构信息利用并且支持可配,那么对解析处理将会有一些帮助

1. 问题及解决思路

由于Excel是多个sheet组成的,所以解析时要区分sheet处理,最好能定义哪些需要处理或者以怎样的顺序处理;对于sheet中的列希望能与数据字段配成映射关系,最好还能配置数据类型、正则校验、默认值、是否非空、是否可缺失等配置项;以上者都可以通过xml配置文件来描述

1
2
3
4
5
<Excel>
<sheet name="" unnecessary="false">
<column name="" field="" default="" type="String" pattern="" notnull="false" unnecessary="false" />
</sheet>
</Excel>

具体实现时先初始化Excel文件,解析规则,以及自定义的sheet处理过程,如果都没问题,则开始按行读取文件,读取过程中检测是否按sheet为单位处理或者按固定条数为批次处理,就当读取完一个sheet或一个批次时就处理一次

1.1. initExcelRule

初始化规则即加载xml的配置,并以<sheet, <column, <key, value>>>的结果保存为Map,另外再记录一份necessarySheet表示必需的sheet

https://github.com/shanhm1991/Echo/blob/master/echo-poi/src/main/java/io/github/echo/poi/parse/ParseSheetTask.java
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
//<sheetName, <columnName, <key, value>>>
private Map<String, Map<String, Map<String,String>>> sheetRule = new LinkedHashMap<>();

private Set<String> necessarySheet = new HashSet<>();

SAXReader reader = new SAXReader();
reader.setEncoding("UTF-8");
Document doc = reader.read(new FileInputStream(ruleXml));
Element root = doc.getRootElement();
for(Object o : root.elements("sheet")){
Element sheet = (Element)o;
String sheetName = sheet.attributeValue("name");
if(StringUtils.isBlank(sheetName)){
continue;
}
Map<String, Map<String,String>> indexMap = new HashMap<>();
for(Object obj : sheet.elements("column")){
Element column = (Element)obj;
String columnName = column.attributeValue("name");
String fileld = column.attributeValue("field");
if(!StringUtils.isBlank(fileld) && !StringUtils.isBlank(columnName)){
Map<String,String> fieldMap = new HashMap<>();
fieldMap.put("field", fileld);
fieldMap.put("type", column.attributeValue("type"));
fieldMap.put("pattern", column.attributeValue("pattern"));
fieldMap.put("default", column.attributeValue("default"));
fieldMap.put("notnull", column.attributeValue("notnull"));
fieldMap.put("unnecessary", column.attributeValue("unnecessary"));
indexMap.put(columnName, fieldMap);
}
}
sheetRule.put(sheetName, indexMap);

String unnecessary = sheet.attributeValue("unnecessary");
if(!"true".equals(unnecessary)){
necessarySheet.add(sheetName);
}
}
1.2. parseRowData

解析行数据就是将列头转换为对应的字段信息,首先将列头信息解析成<列索引-列名称>的映射关系,再结合配置的<列名称-字段>关系,就可以得到<列索引-字段>的关系了,但是实际解析中可能会出现行数据与配置不一致的情况:

  1. sheet中存在的列,但不在配置规则中,则直接忽略,解析时以配置为准;

  2. sheet中不存在的列,但规则有配置,并且unnecessary=”true”(如果为false,则在解析列头信息时就会提前失败),则尝试获取默认值;

  3. sheet中存在的列,规则也有配置,但实际读取行时没有读到这一列(通常在行尾,这时columns.size() <= index),则同样尝试取默认值;

具体实现时,调用者可以根据自定义的type进行字段解析

https://github.com/shanhm1991/Echo/blob/master/echo-poi/src/main/java/io/github/echo/poi/parse/ParseSheetTask.java
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
@Override
protected List<Map<String, Object>> parseRowData(ExcelRow row, long batchTime) throws Exception {
String sheet = row.getSheetName();
List<String> columns = row.getColumnList();

Map<String,Object> data = new HashMap<>();
data.put("sheet", sheet);
data.put("row", row.getRowIndex());//在数据中补上Excel信息,在后续数据层反馈数据错误时方便定位

Map<String, Map<String,String>> rowRule = sheetRule.get(sheet);//not null(filter)
if(row.getRowIndex() == 1){
LinkedHashMap<Integer,String> link = new LinkedHashMap<>();
for(int i = 0;i < columns.size();i++){
link.put(i, columns.get(i).trim());
}
isDataLack(sheet, link, rowRule);
columnIndexMap.put(sheet, link);
return Arrays.asList(data);
}

LinkedHashMap<Integer,String> indexMap = columnIndexMap.get(sheet);//not null
for(Entry<Integer,String> entry : indexMap.entrySet()){
int index = entry.getKey();
String columnName = entry.getValue();//not null
String columnValue = null;
//3. 规则有配置,Excel也有对应列,但当读取目标行时没有这一列(通常在行尾),此时如果有配置默认值,则取默认值
if(columns.size() > index){
columnValue = columns.get(index).trim();
}

Map<String,String> columnRule = rowRule.get(columnName);
//1. Excel有值列,但不在配置规则中
if(columnRule == null){
continue;
}
String field = columnRule.get("field");
String type = columnRule.get("type");
String pattern = columnRule.get("pattern");
String defaultValue = columnRule.get("default");
String notNull = columnRule.get("notnull");
if(StringUtils.isBlank(columnValue)){
if(!StringUtils.isBlank(defaultValue)){
columnValue = defaultValue;
}else if("true".equals(notNull)){
throw new IllegalArgumentException(
buildError("columnValue cannot be null", sheet, row.getRowIndex(), columnName));
}
}

if(!PatternUtil.match(pattern, columnValue)){
throw new IllegalArgumentException(
buildError("invalid columnValue", sheet, row.getRowIndex(), columnName));
}
parseValue(columnValue, type, field, data, columnName);
}

//2. 规则有配置,但Excel没有对应列,不过配置unnecessary="true",同时有默认值,此时取默认值
for(Entry<String, Map<String,String>> entry : rowRule.entrySet()){
if(indexMap.containsValue(entry.getKey())){
continue;
}
Map<String,String> map = entry.getValue();
String defaultValue = map.get("default");
if(StringUtils.isBlank(defaultValue)){
continue;
}
String field = map.get("field");
String type = map.get("type");
String pattern = map.get("pattern");
String notNull = map.get("notnull");
if("true".equals(notNull)){
throw new IllegalArgumentException(
buildError("columnValue cannot be null", sheet, row.getRowIndex(), entry.getKey()));
}
if(!PatternUtil.match(pattern, defaultValue)){
throw new IllegalArgumentException(
buildError("invalid columnValue", sheet, row.getRowIndex(), entry.getKey()));
}
parseValue(defaultValue, type, field, data, entry.getKey());
}

return Arrays.asList(data);
}

protected void isDataLack(String sheetName, Map<Integer,String> indexMap, Map<String, Map<String,String>> rule){
for(Entry<String, Map<String,String>> entry : rule.entrySet()){
String column = entry.getKey();
String unnecessary = entry.getValue().get("unnecessary");
if("true".equals(unnecessary)){
continue;
}
if(!indexMap.containsValue(column)){
throw new IllegalArgumentException("column not found:" + column + ", sheet=" + sheetName);
}
}
}

protected void parseValue(String value, String type, String field, Map<String,Object> data, String columnName) throws Exception {
if(StringUtils.isBlank(value)){
return;
}
data.put(field, value);
}
1.3. batchProcess

如果没有定义SheetHandler,默认的批处理是将所有sheet数据保存为一个Map,最终在onExcelComplete中可以拿到这些数据,进行校验和处理

https://github.com/shanhm1991/Echo/blob/master/echo-poi/src/main/java/io/github/echo/poi/parse/ParseSheetTask.java
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
@Override
protected final void batchProcess(List<Map<String, Object>> sheetData, long batchTime) throws Exception {
//not empty
String sheet = sheetData.remove(0).get("sheet").toString();
SheetHandler handler = sheetHandlerMap.get(sheet);
if(handler != null){
handler.handler(sheet, sheetData, batchTime, excelData);
}else{
Collection<Map<String, Object>> singleSheetData = excelData.get(sheet);
if(CollectionUtils.isEmpty(singleSheetData)){
List<Map<String, Object>> list = new ArrayList<>(sheetData.size());
list.addAll(sheetData);
excelData.put(sheet, list);
}else{
singleSheetData.addAll(sheetData);
}
}
}

@Override
protected final E onExcelComplete(String sourceUri, String sourceName) throws Exception {
Set<String> parsedSheet = excelData.keySet();
List<String> list = new ArrayList<>(necessarySheet.size());
for(String name : necessarySheet){
if(!parsedSheet.contains(name)){
list.add(name);
}
}
if(!list.isEmpty()){
throw new IllegalArgumentException("sheet not found:" + list);
}

return handlerData(excelData);
}

protected abstract E handlerData(Map<String, Collection<Map<String, Object>>> excelData) throws Exception;

2. 示例

创建一个简单的Excel,并定义其解析规则

https://github.com/shanhm1991/Echo/tree/master/echo-poi/src/test/resources/excel/sheetRule.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
<?xml version="1.0" encoding="UTF-8"?>
<Excel>
<sheet name="人员">
<column name="姓名" field="name" />
<column name="年龄" field="age" type="number" />
<column name="性别" field="sex" />
</sheet>

<sheet name="部门">
<column name="名称" field="name" />
<column name="人数" field="num" type="number" />
<column name="成立时间" field="createTime" type="number" />
</sheet>
</Excel>

然后自定义实现类并测试

https://github.com/shanhm1991/Echo/blob/master/echo-poi/src/test/java/io/github/echo/poi/SheetTask.java
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
public class SheetTask extends ParseSheetTask<Boolean> {

private File excel;

public SheetTask(File excel) {
super(excel);
}

protected void parseValue(String value, String type, String field, Map<String,Object> data, String columnName) throws Exception {
if(StringUtils.isBlank(value)){
return;
}

if("number".equals(type)){
data.put(field, Long.valueOf(value));
}else{
data.put(field, value);
}
}

@Override
protected Boolean handlerData(Map<String, Collection<Map<String, Object>>> excelData) throws Exception {
System.out.println(excelData);
return true;
}
}
https://github.com/shanhm1991/Echo/blob/master/echo-poi/src/test/java/io/github/echo/poi/ExcelTest.java
1
2
3
4
5
6
7
8
9
@Test
public void testSheetReader() throws Exception{
Resource resource = new ClassPathResource("SheetTest.xlsx");
File excel = resource.getFile();

SheetTask sheetTask = new SheetTask(excel);
sheetTask.setExcelRule("sheetRule.xml");
sheetTask.call();
}


参考: