总的来说,Spanner 对来自两个研究群体的概念进行了结合和扩充:一个是数据库研究群体,包括熟悉易用的半关系接口,事务和基于 SQL 的查询语言;另一个是系统研究群体,包括可扩展性,自动分区,容错,一致性复制,外部一致性和大范围分布。自从 Spanner 概念成形,我们花费了 5 年以上的时间来完成当前版本的设计和实现。花费这么长的时间,一部分原因在于我们慢慢意识到,Spanner 不应该仅仅解决全球复制的命名空间问题,而且也应该关注 Bigtable 中所丢失的数据库特性。
我们的设计中一个亮点特性就是 TrueTime。我们已经表明,在时间 API 中明确给出时钟不确定性,可以以更加强壮的时间语义来构建分布式系统。此外,因为底层的系统在时钟不确定性上采用更加严格的边界,实现更强壮的时间语义的代价就会减少。作为一个研究群体,我们在设计分布式算法时,不再依赖于弱同步的时钟和较弱的时间 API。
致谢许多人帮助改进了这篇论文:Jon Howell,Atul Adya, Fay Chang, Frank Dabek, Sean Dorward, Bob Gruber, David Held, Nick Kline, Alex Thomson, and Joel Wein. 我们的管理层对于我们的工作和论文发表都非常支持:Aristotle Balogh, Bill Coughran, Urs H ̈olzle, Doron Meyer, Cos Nicolaou, Kathy Polizzi, Sridhar Ramaswany, and Shivakumar Venkataraman.
我们的工作是在Bigtable和Megastore团队的工作基础之上开展的。F1团队,尤其是Jeff Shute ,和我们一起工作,开发了数据模型,跟踪性能和纠正漏洞。Platforms团队,尤其是Luiz Barroso 和Bob Felderman,帮助我们一起实现了TrueTime。最后,许多谷歌员工都曾经在我们的团队工作过,包括Ken Ashcraft, Paul Cychosz, Krzysztof Ostrowski, Amir Voskoboynik, Matthew Weaver, Theo Vassilakis, and Eric Veach; or have joined our team recently: Nathan Bales, Adam Beberg, Vadim Borisov, Ken Chen, Brian Cooper, Cian Cullinan, Robert-Jan Huijsman, Milind Joshi, Andrey Khorlin, Dawid Kuroczko, Laramie Leavitt, Eric Li, Mike Mammarella, Sunil Mushran, Simon Nielsen, Ovidiu Platon, Ananth Shrinivas, Vadim Suvorov, and Marcel van der Holst.
参考文献[1] Azza Abouzeid et al. ―HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads‖. Proc. of VLDB. 2009, pp. 922–933.
[2] A. Adya et al. ―Efficient optimistic concurrency control using loosely synchronized clocks‖. Proc. of SIGMOD. 1995, pp. 23–34.
[3] Amazon. Amazon DynamoDB. 2012.
[4] Michael Armbrust et al. ―PIQL: Success-Tolerant Query Processing in the Cloud‖. Proc. of VLDB. 2011, pp. 181–192.
[5] Jason Baker et al. ―Megastore: Providing Scalable, Highly Available Storage for Interactive Services‖. Proc. of CIDR. 2011, pp. 223–234.
[6] Hal Berenson et al. ―A critique of ANSI SQL isolation levels‖. Proc. of SIGMOD. 1995, pp. 1–10. [7] Matthias Brantner et al. ―Building a database on S3‖. Proc. of SIGMOD. 2008, pp. 251–264.
[7] Matthias Brantner et al. ―Building a database on S3‖. Proc. of SIGMOD. 2008, pp. 251–264.
[8] A. Chan and R. Gray. ―Implementing Distributed Read-Only Transactions‖. IEEE TOSE SE-11.2 (Feb. 1985), pp. 205–212.
[9] Fay Chang et al. ―Bigtable: A Distributed Storage System for Structured Data‖. ACM TOCS 26.2 (June 2008), 4:1–4:26.
[10] Brian F. Cooper et al. ―PNUTS: Yahoo!’s hosted data serving platform‖. Proc. of VLDB. 2008, pp. 1277–1288.
[11] James Cowling and Barbara Liskov. ―Granola: Low-Overhead Distributed Transaction Coordination‖. Proc. of USENIX ATC. 2012, pp. 223–236.
[12] Jeffrey Dean and Sanjay Ghemawat. ―MapReduce: a flexible data processing tool‖. CACM 53.1 (Jan. 2010), pp. 72–77.
[13] John Douceur and Jon Howell. Scalable Byzantine-Fault-Quantifying Clock Synchronization. Tech. rep. MSR-TR-2003-67. MS Research, 2003.
[14] John R. Douceur and Jon Howell. ―Distributed directory service in the Farsite file system‖. Proc. of OSDI. 2006, pp. 321–334.
[15] Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. ―The Google file system‖. Proc. of SOSP. Dec. 2003, pp. 29–43.
[16] David K. Gifford. Information Storage in a Decentralized Computer System. Tech. rep. CSL-81-8. PhD dissertation. Xerox PARC, July 1982.
[17] Lisa Glendenning et al. ―Scalable consistency in Scatter‖. Proc. of SOSP. 2011.
[18] Jim Gray and Leslie Lamport. ―Consensus on transaction commit‖. ACM TODS 31.1 (Mar. 2006), pp. 133–160.
[19] Pat Helland. ―Life beyond Distributed Transactions: an Apostate’s Opinion‖. Proc. of CIDR. 2007, pp. 132–141.
[20] Maurice P. Herlihy and Jeannette M. Wing. ―Linearizability: a correctness condition for
concurrent objects‖. ACM TOPLAS 12.3 (July 1990), pp. 463–492.
[21] Leslie Lamport. ―The part-time parliament‖. ACM TOCS 16.2 (May 1998), pp. 133–169.
[22] Leslie Lamport, Dahlia Malkhi, and Lidong Zhou. ―Reconfiguring a state machine‖. SIGACT News 41.1 (Mar. 2010), pp. 63–73.
[23] Barbara Liskov. ―Practical uses of synchronized clocks in distributed systems‖. Distrib. Comput. 6.4 (July 1993), pp. 211–219.
[24] David B. Lomet and Feifei Li. ―Improving Transaction-Time DBMS Performance and Functionality‖. Proc. of ICDE (2009), pp. 581–591.
[25] Jacob R. Lorch et al. ―The SMART way to migrate replicated stateful services‖. Proc. of EuroSys. 2006, pp. 103–115.
[26] MarkLogic. MarkLogic 5 Product Documentation. 2012.
[27] Keith Marzullo and Susan Owicki. ―Maintaining the time in a distributed system‖. Proc. of PODC. 1983, pp. 295–305.
[28] Sergey Melnik et al. ―Dremel: Interactive Analysis of Web-Scale Datasets‖. Proc. of VLDB. 2010, pp. 330–339.
[29] D.L. Mills. Time synchronization in DCNET hosts. Internet Project Report IEN–173. COMSAT Laboratories, Feb. 1981.
[30] Oracle. Oracle Total Recall. 2012.
[31] Andrew Pavlo et al. ―A comparison of approaches to large-scale data analysis‖. Proc. of SIGMOD. 2009, pp. 165–178.
[32] Daniel Peng and Frank Dabek. ―Large-scale incremental processing using distributed transactions and notifications‖. Proc. of OSDI. 2010, pp. 1–15.
[33] Daniel J. Rosenkrantz, Richard E. Stearns, and Philip M. Lewis II. ―System level concurrency control for distributed database systems‖. ACM TODS 3.2 (June 1978), pp. 178–198.
[34] Alexander Shraer et al. ―Dynamic Reconfiguration of Primary/Backup Clusters‖. Proc. of
SENIX ATC. 2012, pp. 425–438.
[35] Jeff Shute et al. ―F1—The Fault-Tolerant Distributed RDBMS Supporting Google’s Ad Business‖. Proc. of SIGMOD. May 2012, pp. 777–778.
[36] Yair Sovran et al. ―Transactional storage for geo-replicated systems‖. Proc. of SOSP. 2011, pp. 385–400.
[37] Michael Stonebraker. Why Enterprises Are Uninterested in NoSQL. 2010.
[38] Michael Stonebraker. Six SQL Urban Myths. 2010.
[39] Michael Stonebraker et al. ―The end of an architectural era: (it’s time for a complete rewrite)‖. Proc. of VLDB. 2007, pp. 1150–1160.
[40] Alexander Thomson et al. ―Calvin: Fast Distributed Transactions for Partitioned Database Systems‖. Proc. of SIGMOD.2012, pp. 1–12.
[41] Ashish Thusoo et al. ―Hive — A Petabyte Scale Data Warehouse Using Hadoop‖. Proc. of ICDE. 2010, pp. 996–1005.
[42] VoltDB. VoltDB Resources. 2012.