XML Data – Semi-Structured Data XML 数据 - 半结构化数据

Outline • Structured, Semistructured, and Unstructured Data • XML Hierarchical (Tree) Data Model • Extracting XML Documents from Relational Databases • XML Documents, DTD, and XML Schema • XML Languages 结构化、半结构化和非结构化数据 - XML 层次(树)数据模型 - 从关系数据库中提取 XML 文档 - XML 文档、DTD 和 XML 模式 - XML 语言

Structured, Semi-structured, and Unstructured Data • Structured data --- Represented in a strict format (schema) --- Example: information stored in databases结构化数据 - 以严格的格式(模式)表示 - 例如:存储在数据库中的信息 • Semi-structured data --- Has a certain structure --- Not all information collected will have identical structure半结构化数据 - 具有一定的结构 - 并非所有收集到的信息都具有相同的结构 • Unstructured data --- Limited indication of the of data document that contains information embedded within it 非结构化数据 - 数据文件的有限指示,其中包含嵌入的信息

Examples • Structured: Excel spreadsheets Comma-separated value file (.csv) Relational database tables • Semi-structured: Hypertext Markup Language (HTML) files, JavaScript Object Notation (JSON) files, Extensible Markup Language (XML) files • Unstructured: Audio, Video, Flat Text示例 - 结构化: Excel 电子表格 逗号分隔值文件(.csv) 关系数据库表格 - 半结构化: 超文本标记语言 (HTML) 文件、JavaScript Object Notation (JSON) 文件、可扩展标记语言 (XML) 文件 - 非结构化: 音频、视频、扁平文本

Semi-structured Data • Schema information mixed in with data values • Self-describing data • May be displayed as a directed graph --- Labels or tags on directed edges represent:半结构化数据 - 混合在数据值中的模式信息 - 自描述数据 - 可以有向图的形式显示 ◦ Schema names ◦ Names of attributes ◦ Object types (or entity types or classes) ◦ Relationships有向边上的标签或标记表示: 模式名称 ◦属性名称 ◦对象类型(或实体类型或类) ◦关系XML: Extensible Markup Language • Data sources --- Database storing data for Internet applications --- Standard for data representation and exchange 数据源 - 为互联网应用程序存储数据的数据库 - 数据表示和交换标准 • Hypertext documents (HTML) --- Common method of specifying contents and formatting of Web pages --- Tags describe content instead of formatting 超文本文档(HTML) - 指定网页内容和格式的通用方法 • XML data model

XML Hierarchical (Tree) Data Model • Elements and attributes --- Main structuring concepts used to construct an XML document 元素和属性 - 用于构建 XML 文档的主要结构概念• Complex elements --- Constructed from other elements hierarchically 复杂元素 - 由其他元素分层构建• Simple elements --- Contain data values 简单元素 - 包含数据值• XML tag names --- Describe the meaning of the data elements in the document --- Start tag: angled brackets -- --- End tag with a slash --XML 标记名称 - 描述文档中数据元素的含义 - 开始标记:带角度的括号 - <...> - 带斜线的结束标记 - </...> Company ER Model

Company Relational ModelCompany Entities

Relational to XML Mapping

Relational Model v.s. XMLKnowledge Check • You're creating a database to contain information about university records: students, courses, grades, etc. Should you use the relational model or XML? • You're creating a database to contain information for a university web site: news, academic announcements, admissions, events, research, etc. Should you use the relational model or XML? • You're creating a database to contain information about family trees (ancestry). Should you use the relational model or XML?"Well-Formed" XML Adheres to basic structural requirements • Single root element • Matched tags, proper nesting • Unique attributes within elements "格式完善的 "XML 符合基本结构要求 - 单一根元素 - 匹配的标记,适当的嵌套 - 元素内的唯一属性

Displaying XML Use rule-based language to translate to HTML • Cascading stylesheets (CSS) • Extensible stylesheet language (XSL)使用基于规则的语言翻译成 HTML - 层叠样式表 (CSS) - 可扩展样式表语言 (XSL)

Extensible Markup Language (XML) • Standard for data representation and exchange • Formal specification is enormous; we cover most important components

"Valid" XML Adheres to basic structural requirements • Also adheres to content-specific specification --- Document Type Descriptor (DTD) --- XML Schema Description (XSD)符合基本的结构要求 - 也符合特定内容规范 - Document Type Descriptor (DTD) - XML Schema Description (XSD) 所以是在"Well-Formed" XML的基础上符合特定内容规范

Document Type Descriptor (DTD) • Grammar-like language for specifying elements, attributes, nesting, ordering, #occurrences 文档类型描述符(DTD)--类似语法的语言,用于指定元素、属性、嵌套、排序和 #occurrences

XML Schema (XSD) • Extensive language • Like DTDs, can specify elements, attributes, nesting, ordering, #occurrences • Also data types, keys, (typed) pointers, and more • XSD is written in XML 扩展语言 - 与 DTD 类似,可指定元素、属性、嵌套、排序、#occurrences - 还可指定数据类型、键、(类型化)指针等 - XSD 以 XML 编写

DTD/XSD v.s. None (Well-Formed) • Advantages --- Program can assume the structure --- CSS/XSL rules are simple when program has particular structure --- Specification language DTD as a specification what the XML look like --- Documentation --- Strongly typed Data • Disadvantages --- Flexibility and ease of change is difficult --- DTD can be messy irregular structure --- Benefits of no typing //"Valid" XML对比"Well-Formed" XML的优点 - 程序可以假设结构 - 当程序具有特定结构时,CSS/XSL 规则很简单 - 规范语言 DTD 作为 XML 的外观规范 - 文档 - 强类型化数据 - 缺点 - 难以实现灵活性和易更改性 - DTD 可能是杂乱无章的不规则结构 - 无类型化的优点

Querying XML • Not nearly as mature as Querying Relational --- Newer --- No underlying algebraXML 查询 - 不如关系查询成熟 - 较新 - 没有底层代数 • Sequence of development --- XPath: Path expression + conditions --- XSLT: Xpath transformation, output formatting --- XQuery: Xpath + full featured QL 发展顺序 - XPath: XSLT:Xpath 转换、输出格式化 - XQuery: X路径+全功能QL

XPath = Path expressions + Conditions XPath = 路径表达式 + 条件

  • 基本结构 - / :根元素分隔符

  • 元素名称 X

  • * : 匹配路径中的任何节点

  • @ : 属性名

  • // : 任何子节点或自通配符

  • [C] : 条件

  • [N] : 根据位置访问子节点

• Built-in functions (lots of them) --- starts-with() and contains() : built-in functions work on string values and can be useful to access elements based on substring matches. /companyDB/employees/employee[starts-with(lname,"S")] /companyDB/employees/employee[contains(address,"Philadelphia")] 内置函数(很多) - starts-with() 和 contains():内置函数用于处理字符串值,可用于访问基于子串匹配的元素。 Navigation "axes" (13 of them) --- Keywords that allows us to move in multiple directions from current node in path expression ◦ includes, self, child, descendent, attribute, parent, ancestor, previous sibling, and next sibling- 导航 "轴"(共 13 个)--允许我们从路径表达式中的当前节点向多个方向移动的关键字 ◦ 包括、自己、子节点、后节点、属性、父节点、祖节点、上一个兄弟节点和下一个兄弟节点

More Details • XPath queries operate on & return sequence of elements --- XML document --- XML stream • Sometimes result can be expressed as XML, not always XPath 查询操作并返回元素序列 - XML 文档 - XML 流 - 有时结果可表示为 XML,但并非总是如此

XQuery: FLWOR Expression • All except Return are optional • For and Let can be repeated and interleaved XQuery: FLWOR 表达式 - 除了 Return 之外,其他都是可选的 - For 和 Let 可以重复和交错使用Mixing Queries and XML

Summary • Three main types of data: structured, semi-structured, and unstructured • XML standard --- Tree-structured (hierarchical) data model --- XML documents and the languages for specifying the structure of these documents • XPath and XQuery languages --- Query XML data 三种主要数据类型:结构化、半结构化和非结构化 - XML 标准 - 树状结构(分层)数据模型 - XML 文档和指定这些文档结构的语言 - XPath 和 XQuery 语言 - 查询 XML 数据

相关推荐
程序员爱技术39 分钟前
Vue 2 + JavaScript + vue-count-to 集成案例
前端·javascript·vue.js
m0_571957582 小时前
Java | Leetcode Java题解之第543题二叉树的直径
java·leetcode·题解
并不会2 小时前
常见 CSS 选择器用法
前端·css·学习·html·前端开发·css选择器
衣乌安、2 小时前
【CSS】居中样式
前端·css·css3
兔老大的胡萝卜2 小时前
ppk谈JavaScript,悟透JavaScript,精通CSS高级Web,JavaScript DOM编程艺术,高性能JavaScript pdf
前端·javascript
低代码布道师2 小时前
CSS的三个重点
前端·css
耶啵奶膘3 小时前
uniapp-是否删除
linux·前端·uni-app
魔道不误砍柴功4 小时前
Java 中如何巧妙应用 Function 让方法复用性更强
java·开发语言·python
NiNg_1_2344 小时前
SpringBoot整合SpringSecurity实现密码加密解密、登录认证退出功能
java·spring boot·后端
闲晨4 小时前
C++ 继承:代码传承的魔法棒,开启奇幻编程之旅
java·c语言·开发语言·c++·经验分享