はい、では 前編(基調講演編)からの続きです。
午後のセッションに関しても、あえて雑感を書いておくと、
- Nova はものすごくきな臭い感じがした(【F-3】、【F-4】を聴いた結果)、ってことと、、、、
- OpenStack の話を聞きに来たつもりだったけど、 Ceph について話していた最後のセッション”【F-6】Ceph loves OpenStack” が一番丁寧でわかりやすいセッションだった
- 日本人ってやっぱり説明が丁寧だよなぁ、と。( RedHat Forum はほとんどのセッションが本国、米のスピーカーだったので、その方々と比較してってことですね)
ちなみに、わたしが聴講してきたのは、以下の 5 セッションです。
- 【F-1】12:50~13:30 OpenStack, OpenDaylight and OPNEV - OpenStackとNFVの関係。OpenDaylight プロジェクト最新情報 クリス・ライト氏
- 【F-3】14:40~15:20 Transform IT with RH Enterprise Linux OpenStack Platform - Red HatのOpenStack最新情報 ジェフ・ジェムソン氏
- 【F-4】15:50~16:30 OpenStack Nova Technical Deepdive ニコラ・ディパノヴ氏
- 【F-5】16:45~17:25 OpenStack Nova Deepdive Advanced ニコラ・ディパノヴ氏ふたたび
- 【F-6】17:40~18:20 Ceph loves OpenStack: Why and How 岩尾はるか氏
では、以降より各セッションでわたしがとってきたメモです。
【F-1】12:50~13:30 OpenStack, OpenDaylight and OPNEV - OpenStackとNFVの関係。OpenDaylight プロジェクト最新情報 クリス・ライト氏
- Agenda - emarging technology story.
- SW defined Networking (SDN)
- SDN and NW virtualization
- SDN is many things to many people
- separation of control plane and data plane
- programmatic IF for NW control
- NW virtualiztion
- Decoupling ligical (overlay) NW topology from physical (underlay) topology
- RH focus
- virtual NWs defined using OpenStack NW service (Neutron)
- VXLAN overlay for decoupling and scalability
- Layer 2-7
- SDN is many things to many people
- OpenDaylight?
- OSS SDN
- open
- transparent
- merit-based
- Consortium
- facilitate
- advocate
- support
- RH is platinum founding member
- OSS SDN
- OpenDaylight SDN Platform
- Modular, extensible pluggable
- Java, OSCi, Karaf based platform
- Evolving towards model driven using YANG
- Multi-protocol
- Openflow + other protocol
- Eclipse Public Licence
- OpenDaylight SDN Platform
- image
- Hydrogen Reease
- Feb, 4, 2014 first released
- Over 150 contributers, over 3M lines of code in 12 projects
- Black Duck "Rookie of the Year"
- Open NW summit SDN Idol Finalist
- Winner INTEROP Best in SDN
- Winner INTEROP Best in Show
- Three editions
- Projects in the Hydrogen Release
- Controller
- VTN # big contribution of NEC
- OpenDove
- Affity Management Service
- LISP Mapping Service
- Yang Tools
- Defense4All
- OVSDB #
- etc... all 12 projects
- RH focus on Newtron connection, OpenStack Serviece
- RH ODL Focus
- Integraion with OS
- ML2 ODL driver
- Overlay NWs
- Standards based
- OVSDB, OpenFlow .3, OS Neutron
- Integraion with OS
- Felium Release
- 2nd release, Oct 2, 2014
- over 200 contributors, over 4M lines of code in 21 projects
- One Karaf "edition"
- Feature-based configration
- commit statistics
- pie chart showing differnt companies contiributions
- RH is No.2 contribution
- pie chart showing differnt companies contiributions
- Helium Release
- AAA
- etc, 21 project
- RH focused on
- Continued integration with OpenStack
- ML2 ODL Driver + extentions (L3, *aaS)
- Overlay NW management
- Add OVSDB HW_VTEP schema support
- Underlay informing (e.g. QoS)
- MD-SAL
- AAA # Authorization: keystone(OpenStack)
- OpFlex
- SFC (NFV context)
- Infrastructure (testing and performance)
- Continued integration with OpenStack
- SDN and NW virtualization
- Optimized Data Plane
- OpenV vSwitch
- Multi-layer virtual switch
- OVSDB: config managed by
- OpenFlow: flow tables controlled by
- Provides connection between VMs on same Host
- Provides uplink to physical NW via host NIC
- Data fast path in-kernel
- Cahllenges
- kernel NW stack can be bottleneck
- 64bytes packet processing rates suffer
- Microflows vs. megaflows
- Multi-layer virtual switch
- DPDK
- Library for userspace packet processing
- Diretly manages NIC with userspace poll mode driver(PMD)
- Polls driver NIC for packets, NIC DMAs directly to application buffers
- Platform specific optimizations
- Hugepages, NUMA and cacheline aware
- Batched packet processing
- CPU instructions (SSE4, AVX, etc)
- Challenges
- API/ABI compatibility, difficult to package in distribution
- Duplicate driver stacks, limited driver support
- compile time rather than runtime optimizations
- Currently x86-centric
- OVS integration disables kernel features
- OVS + DPDK
- intel reports improved packet processing rates
- 10 times faster than OVS with kernel v-host
- ivshmem and memnic
- ivshmem - to share memory beteween VMs
- memnic - format shared memory segment as NIC
- Challenges
- ivshmem not well supported upstream QEMU
- diables live migration
- new driver in VM
- vhost-user
- vhost-net allows virtio to bypass QEMU, all in kernel
- OVS + DPDK is in userspace
- vhost-user allows virtio to bypas QEMU, all in userspace
- Challenges
- performance parity w/memnic
- SR-IOV
- capable NIC has embedded switch
- ...
- OpenV vSwitch
- NW functions virtualization (NFV)
- NFV
- NW functions are trappend in function specific HW
- virtualize NW functions
- Distribute VNFs on COTS-based IaaS - a Cloud
- Steer traffic with SDN
- Why NFV?
- Reduce time to market for new services
- improve business agility
- Reduce CAPEX and OPEX
- Reduce time to market for new services
- NFV value to Ops
- OpenNFV?
- OSS NFV reference implememtation
- Conforitium
- Facilitate
- advocate
- Support
- RH is platinum founding member
- Architecture
- OpenDaylight
- Linux KVM
- OVS + DPDK
- OpenStack
- NFV
- All routes lead to OpenStack
- Puting it all together
- NFV OpenStack Challenging
- Performance
- Determinism
- Reliablity
- NFV OpenStack performance and Determinism
- NUMA aware cpu, mameory and IO sheduling
- VM memory backed by hugepages
- ...
- Reliablity
- All infra deployed with HA
- VM HA (non-cloud aware applicaton)
- rich monitoring requirement
- Fault detection, resource consumption
- ability to monitor KPIs
- NFV OpenStack Misc
- Making NFV and OpenStack real
- wiki.openstack NFV
【F-3】 14:40~15:20 Transform IT with RH Enterprise Linux OpenStack Platform - Red HatのOpenStack最新情報 ジェフ・ジェムソン氏
- Workloads are transformin again
- Traditional workloads to cloud workloads
- Traditional workloads
- typically resides on a single large virtual machine
- cannot tolerate downtime
- requires HA
- application scales up rather than out
- Cloud workloads
- workload reside on multiple VM
- tolerates Failure
- Traditional workloads
- Why we are doig this?
- Our data is too large
- vast amount of data
- way past the ablility of traditional system and apps
- scaling up no longer works
- Service requests are too large
- more and more client devices coming online
- much harder to maintain service to customers
- Applications weren't written to cope with demand
- Our data is too large
- Why OpenStack?
- Brings public cloud like apabilities in to your DC
- provides massive on-demand (scale out) capacity
- 1,000s -> 10,000s -> 100ks of VMs
- It's Open
- Community development = higer "heature volocity"
- features and functions you ned, faster to marktet over proprietary SW
- What is OpenStack?
- A massively scalable infra as a service platform
- HORIZON, NOVA, GLANCE, SWIFT, NEUTRON, CINDER, HEAT, CEILOMETER, KEYSTONE
- each is developed independently but close work together
- Designed as modular services
- Built for scale out architecture
- A massively scalable infra as a service platform
- Why RH?
- OpenStack is dependent on the underlyig Linux
- Running on top of the Linux OS.
- dependent on all Linux functionality
- performance, etc... all.
- needs access to x86 HW resources
- Needs an operationg environment, hypervisor, other system services
- Uses exisiting code libraries for functionality
- and they are sure RH Enterprise Linux is trully reliable.
- OpenStack is optimized and co-engineerd with RH Linux
- OpenStack is dependent on the underlyig Linux
- The importance of integration with RH enterprise Linux
- A typical OpenStack cloud is made up of
- core cloud services
- nova, glance, swift,,,,,
- Plugins to interact with 3rd party ....
- core cloud services
- Examples of RHEL optimized enablers for OpenStack
- Virtualization
- Security - SELinux
- NW - SDN/OVN
- Storage - vendor plugins, performance, thin provisioning (Ceph)
- Ecosystem - certification of HW, Storage, and NW
- the pariring of th linux OS and OpenStack is so close that RH is the only vendor is uniquely positioned to most effectively support functionality, performance, security, system-wide stability, and ecosystem support
- A typical OpenStack cloud is made up of
- Worlds largest OpenStack partner echosystem
- RH OpenStack Cloud infra Parter NW
- over 235+ members
- over 900 certified solutions in partner Marketplace
- over 4,000 RHEL certified compute servers
- over 13,000 applications available on RHEL
- Large catalog of windows certified applications
- RH OpenStack Cloud infra Parter NW
- RH community leadership
- top contributer to Juno release
- activity.openstack.org/dash/browser
- Proof that RH has skills, resources to
- Support, etc...
- wide ranging participation, contrasts with most others who are more narrowly focused
- RH has created enterprise distribution
- top contributer to Juno release
- service for OpenStack and Cloud
- Training
- Certification
- Consulting
- Who actually using this (Customer Success) ?
- NCI (National COmputational Infrastructure)
- AU based company
- Deployed RH Enterprise Linux OpenStack Platform
- requires the security certifications RH provided
- NANYANG TECHNOLOGICAL UNIV.
- Deployed a hybrid cloud infra with RH Enterprise Linux OpenStack Platform
- scalability
- automatic resource provisining
- saving cost (allowed better use of existing resources)
- greater collab between agencies
- Deployed a hybrid cloud infra with RH Enterprise Linux OpenStack Platform
- NCI (National COmputational Infrastructure)
- Summary
- All benefit of community OpenStack and
- Enterprise hardened code
- integrated with RH Enterprise Linux
- Enterprise SW lifecycle
- World-wide global support
- Partner ecosystem
- training, certification, and consulting
- integreted with trusted stack
- RH CloudForms
- RH Enterprise Viertualization
- RH Storage (incl. Ceph)
- Foundation for OpenShift (PaaS)
- All benefit of community OpenStack and
- OpenStack enables user to realize hybrid cloud. e.g. AWS + On-premise
- What analyst saying
- RH is applying its experiece in commercializing OSS linux for the entrprise and its methodology to OpenStack
- The company has made some smart moves in the OpenStack space and itll work out for them,,, theyve always been the OSS company
- 3 ways to get OpenStack RH
- 90 days evaluation
- Purcahse Supported product
- Enterprise Linux OpenStack Platform
- Cloud Infrastructure
- Traditional workloads to cloud workloads
【F-4】15:50~16:30 OpenStack Nova Technical Deepdive ニコラ・ディパノヴ氏
- whoami
- Hacking on Nova sinse 2012
- Core reviewer since 2013
- Topic covered
- Overivew of Nova deployment and services
- A look at how services communicate
- Closer look into internals of some of them (conductor and scheduler)
- Nova Objects
- Nova Cells Services
- OpenStack Nova in a nutshell
- Manage cloud compute resources through a REST API
- Schedule and provision VMs
- Storage and NWing handled by other components
- VM lifecycle management (start, stop, resize, snapshot...
- Nova is
- service oriented architecture
- a number of services with diffrent functions commutincating through a message bus
- system state kept in a central DB
- Logical diagram (image)
- Queue
- nova-api
- console
- compute
- etc..
- Queue
- Services - cast of charactors
- Core: APIs, Schedular, Conductor, Compute, and maybe NW
- Helper: Console proxies, consoleauth objectstore
- Non-nova: MariaDB, RabbitMQ, memcached, libvirtd
- Actual deployment image
- DC scenario
- general case, a lot of compute nodes
- Cloud controller spread across multiple nodes
- Compute nodes with disks
- optional
- monitoring, VPN, etc..
- a.k.a How services communicates
- $ nova boot --image fedora --flavor 1 test
- explaining how this works with diagram.
- API
- Scheduler
- Conductor
- Compute
- libvirtd
- explaining how this works with diagram.
- RPC and oslo.messaging
- General purpose messaging library
- closely maps to AMQP but not only
- differt drivers (Rabbit, qpid, zmq)
- supports versions (versioning dome in app code)
- by default uses eventlet green threads for dispatching conn
- ex. python
- cctxt = self.client.prepare(server=host, verson=version)
- cctxt.cast(ctxt, 'build_and_run_instance', **data)
- Scheduler
- Filter scheduler
- Only services that is not completely horizontaly scalable
- desined to be non-blocking and favor quick decisions over correctness
- in practice, can be a bottle-neck because it "learns the world" on the every request
- actually there is a caching technique.
- Scheduler in more detail
- opportunistic scheduling - requests can fail when capacity is low
- simple filtering logic
- ...
- Conductor Service
- original idea - proxy DB access for compute nodes
- evolved into a central orchesration service
- horizontaly scalable
- but one thing, DB is the bottleneck
- plays the central part in making "Nova objects" work.
- Nova objects
- PRC calls are versioned but data isnt
- Nova objects + conductor give us that (data versioning)
- road to live upgrades
- upgrade conductor and DB
- compute nodes still use old code but conductor makes it works
- Future: do data migrations on the fly
- in more detail
- massively simplifies dealing with database, directly or over RPC
- lower bar for adding new methods and data
- bundles data nad methods in a versioned packages
- ...
- Nova objects
- Cells - more scale
- Scale out Nova without doing DB/MQ replication
- Each cell is a separate Nova w/o API service + a Cell serivce (own DB and MQ)
- Parent cell runs Nova API and a cells schedular that chooses a cell
- Inter cell comms over a separate message bus
- in detail
- Parent cell accespts the request and posts a meesage on the cells topic
- One of the cells services picks up the message, runs the scheduling code, and dispatches to the chosen cell
- most of the DB work is replicated up to the parent cell, as API code still needs to work
- Advantages
- Not invasive to current deployments
- Tree structure - built for scale
- Has real world users
- Barkley uses
- Downsides
- not enough upstream testing
- still deemed experimental
- a number of features not supported or broken
- no horizon support
- Future - uncertain :(
- $ nova boot --image fedora --flavor 1 test
【F-5】16:45~17:25 OpenStack Nova Deepdive Advanced ニコラ・ディパノヴ氏ふたたび
- Topics
- Overview of Nova as a Python project
- Ading featurs to Nova
- Example: Scheduler
- Virt drivers
- Road to live upgrades
- Evolving the data model
- Motivation for this task
- Highlight some issues that influence how the project will evolve
- Common for large OSS project
- useful for people interested in adding features
- users if he wants can add new features to Nova
- Nova - the python codebase
- very large: about 400,000 line of python code on a recent Juno release
- Issues with a large codebase
- no single person can be an expert
- difficult to grow the core team
- A lot of interactions with unstale APIs - coupling and tech debt
- Reivews take a long time = downward spiral
- Issues with a large codebase
- very large: about 400,000 line of python code on a recent Juno release
- Adding features upsream Problems
- not all APIs versioned
- data model changes usually not done with high enough review
- complex interactions through ill-defined APIs - edge case bugs that get missed in the review process
- Solving - a SW engineering challenge (this is challenging)
- Scheduler - coupling example
- current design - opportunistic scheduleing (no locking, potential retries)
- tehis requieres the placement logic to be re-run on the compute host
- which in turn requires all data to be there
- correct data (format, etc...
- Booting - data view
- explained using diagram
- similar but data view explanation of $ nova boot ....
- showing python code
- Several Problems
- A lot of the data that gets passed aroud is not versioned
- There is no standard data model
- difficult to understand the flow of data
- There is hope howeever
- Scheduler split a.k.a Gantt project
- Main idea - have a standalone service
- Code re-use (all project implement a scheduler)
- more scalable
- Open up the ability to do cross-project aware scheduling (Cinder, Neutron)
- define the data model first
- Current (Kilo targeted) atempts look more promissing
- RH leading the effort
- Virt drivers
- Nova ships with pluggable "drivers" for several popular hypervisors
- Libvirt/KVM, Xen, Hyper-V, VMware VCenter
- Whichg driver Nova compute service will load is configurable
- Upstream gate only tests the libvirt-kvm others are tested hrough 3rd party
- Split out?
- yet another place where devide and conquer can work
- core team is the bottleneck
- very few people deeply familiar with omre than one
- a slightly more stable API
- Can they be split out into separate repos?
- Is there a real benefit?
- Road to upgrade
- currently there is a large lockstep
- roll the DB schema forward (downtime)
- upgrade everything but compute nodes
- we have a functioning cloud now
- thanks to conductor + NovaObject
- Finally - upgrade compute nodes at your own place
- currently there is a large lockstep
- Road to live upgrade
- Where we want to be?
- have only conductor serivices on the critical upgrade path
- migrate the DB schema over time (no lockstep)
- ...
- Where we want to be?
- Evolving the data model
- Not the only source of problems, but a major one
- Much better now thanks to NovaObjects
- Quite performance sensistive
- because use DB heavily when Nova did this
- So in short
- Nova is large - it can cause problems
- Slow down the project
- Scaling perf and quality issues
- Tech debt
- there are upstream efforts to address these issues
- this is how OSS works and need to be considered
- Nova is large - it can cause problems
- How can non-developers follow progress
- There is no single ans but
- follwo the nova-specs repository and relevant BPs
- find out who the key people(of course incl its developer) are
- join the Nova upstream IRC meeting (there is weekly meeting)
【F-6】17:40~18:20 Ceph loves OpenStack: Why and How 岩尾はるか氏
- Ceph のアーキテクチャの説明
- Ceph のあらまし
- Ceph とは?
- OSS 分散ストレージ
- Object とブロック両対応
- エクサバイトを射程にいれている
- 1,000 node 超を想定
- 歴史 - 10 年の歴史がある
- 2004 UCSC で開発開始
- 2014 RH が Inktank を買収
- Ceph の統合されたストレージ
- Object Storage
- S3 and Swift
- multi-tenant
- keystone
- geo-replication
- Block Storage
- OpenStack
- Clone
- Snapshot
- File Storage
- POSIX
- Linux Kernel
- CIFS/NFS
- HDFS
- Object Storage
- Ceph を支えるコニュニティ
- 306 developers
- 475 participants
- 1,668 discussion participants
- Ceph とは?
- Inktank Ceph Enterprise について
- ひとことでいうと商用版
- 略称 ICE
- Ceph + Calamari (monitoring tools, RESTful API) + setup tools and support
- より厳しい QA
- 長期間のサポート
- ICE のメリット
- コストが安い
- Ops が簡単ということもある
- 将来性
- 長期間のサポート
- 単一のわかりやすい料金体系
- ロードマップ
- 専門性
- Ceph の専門家
- developer によるサポート
- エンタープライズ READY
- 既存インフラの活用
- SLA つきサポート
- コストが安い
- ICE のリリース計画
- 3 ヶ月おきのリリース
- アルファベット順
- 3 ヶ月おきのリリース
- ロードマップ
- 1.2
- RHEL 7 support
- 2.0
- iSCSI
- RBD ミラーリング
- 1.2
- Ceph のアーキテクチャ
- RADOS - LIBRADOS - RGW, RDB, CEPHFS
- RADOS
- 信頼性のある
- 自律的な
- 互いに通信し、障害を検知。
- 分散
- オブジェクトストア
- Ceph の中核
- すべてのデータを RADOS に保存
- mon と osd の2つからなる
- OSD
- Object Storage Daemon
- 1 disk に 1 OSD
- xfs/btrfs をバックエンド
- 整合性担保と性能向上のために write ahead なジャーナルを利用
- OSD の台数は 3- 数万
- mon
- monitoring daemon
- クラスタマップとクラスタの状態の管理
- 3, 5 まど奇数で少ない台数で運用できる
- OSD
- CRUSH アルゴリズム
- オブジェクトの配置に用いられるアルゴリズム
- 配置される場所は 100% 計算のみで求められる
- なのでメタデータサーバが不要
- SPoF はない
- 非常によいスケーラビリティ
- クラスタマップ
- 階層的な OSD のマップ
- 障害単位をまたいで複製
- トラフィックの集中を防止
- 階層的な OSD のマップ
- LIBRADOS
- RGW <-> APP
- RADOS Gateway
- REST base オブジェクトストアプロキシ
- S3, Swift 互換
- 課金のための統計情報も
- RDB <-> HOSTs (図解)
- RBD + 仮想化
- RBD + カーネルモジュール
- ディスクイメージを格納
- クラスタ全体にストライピング
- スナップショットサポート
- Copy on write (CoW)
- Linux Kernel, KVM, OpenStack から利用可能
- CEPHFS
- POSIX 互換共有ファイルシステム
- コミュニティ版において実験的な実装が存在
- ICE には含まれない
- Ceph と OpenStack の連携
- 全体を表した図解 - みやすい
- Swift/Keystone 統合
- 認証を統合
- Swift 互換 API の提供が可能
- Glance 統合
- VM の OS イメージ格納に使える
- Glance の組み込みドライバを利用
- Cinder 統合
- Disk イメージを RDB に格納
- Cinder の中に組み込みドライバ
- CoW クローンが利用可能
- NOVA/Hypervisor 統合
- KVM にドライバを統合
- RBD 上のボリュームを直接マウント
- 高いパフォーマンス
- FUSE などを使わないのでオーバーヘッドがない
- 安定性
- 最後に
- Ceph What?
- オブジェクト、ブロックそれぞれに最適化したアーキテクチャをもつ
- 単一のストレージプール
- 高いディスク使用効率
- OpenStack の各コンポーネントに組み込みのドライバサポート
- Why Ceph loves OpenStack?
- 組み込みドライバによる高性能、安定性
- CoW によるクローンスナップショットのサポート
- 大きなコミュニティ
- RH は Ceph, OpenStack 両方の最大の貢献者
- NFS と同様に広く使われているという調査結果 (OpenStack ユーザ調査より)
- OpenStack のユーザということが肝
- Ceph の優位性
- エクサバイトが視野
- OpenStack と深い統合 (native 実装はすごく大事
- 広く活発なユーザコミュニティ
- ICE のまとめ
- Ceph の利点をエンタープライズで
- よりながいライフサイクル
- インストーラー提供
- コンサルティング
- Clamari の統合
- Hotfix 提供、ロードマップへの反映
- 日本でも提供
- サポートは英語 (現時点)
- Ceph の利点をエンタープライズで
- RHEL-OSP と ICE
- RH が持つ 2 つの製品
- OpenStack とそのストレージに対する単一のソリューション
- サポート、コンサルティングをワンストップで提供
- Ceph What?
と、メモはここまでってことで、今回は以上です。
あわせて読まれたい
- Redhat Forum 2014 に #OpenStack の話をメインで聴きにいってきた(基調講演編) #redhatforum
- マイナビニュースITサミットの伊藤直也氏の特別講演がとてもよかった!
- #hcj2014 Hadoop Conference Japan 2014 で主に SQL on Hadoop の話を中心に聞いてきました(超個人的総括エントリ)
- #hcj2014 並列SQLエンジンPresto - 大規模データセットを高速にグラフ化する方法のメモ。
- 法被を脱いで最終形態になった @shiumachi 氏が最強だった #hcj2014 Evolution of Impala - Hadoop 上の高速SQLエンジン、最新情報のメモ