2014-10-13

Redhat Forum 2014 に #OpenStack の話をメインで聴きにいってきた（午後のセッション編） #redhatforum

f:id:garage-kid:20141013105205p:plain

午後のセッションに関しても、あえて雑感を書いておくと、

Nova はものすごくきな臭い感じがした（【F-3】、【F-4】を聴いた結果）、ってことと、、、、

OpenStack の話を聞きに来たつもりだったけど、 Ceph について話していた最後のセッション”【F-6】Ceph loves OpenStack” が一番丁寧でわかりやすいセッションだった

日本人ってやっぱり説明が丁寧だよなぁ、と。（ RedHat Forum はほとんどのセッションが本国、米のスピーカーだったので、その方々と比較してってことですね）

ちなみに、わたしが聴講してきたのは、以下の 5 セッションです。

【F-1】12:50～13:30 OpenStack, OpenDaylight and OPNEV - OpenStackとNFVの関係。OpenDaylight プロジェクト最新情報クリス・ライト氏

【F-3】14:40～15:20 Transform IT with RH Enterprise Linux OpenStack Platform - Red HatのOpenStack最新情報ジェフ・ジェムソン氏

【F-4】15:50～16:30 OpenStack Nova Technical Deepdive ニコラ・ディパノヴ氏

【F-5】16:45～17:25 OpenStack Nova Deepdive Advanced ニコラ・ディパノヴ氏ふたたび

【F-6】17:40～18:20 Ceph loves OpenStack: Why and How 岩尾はるか氏

では、以降より各セッションでわたしがとってきたメモです。

【F-1】12:50～13:30 OpenStack, OpenDaylight and OPNEV - OpenStackとNFVの関係。OpenDaylight プロジェクト最新情報クリス・ライト氏

Agenda - emarging technology story.
SW defined Networking (SDN)
- SDN and NW virtualization
  - SDN is many things to many people
    - separation of control plane and data plane
    - programmatic IF for NW control
  - NW virtualiztion
    - Decoupling ligical (overlay) NW topology from physical (underlay) topology
  - RH focus
    - virtual NWs defined using OpenStack NW service (Neutron)
    - VXLAN overlay for decoupling and scalability
    - Layer 2-7
- OpenDaylight?
  - OSS SDN
    - open
    - transparent
    - merit-based
  - Consortium
    - facilitate
    - advocate
    - support
    - RH is platinum founding member
- OpenDaylight SDN Platform
  - Modular, extensible pluggable
  - Java, OSCi, Karaf based platform
  - Evolving towards model driven using YANG
  - Multi-protocol
    - Openflow + other protocol
  - Eclipse Public Licence
- OpenDaylight SDN Platform
  - image
- Hydrogen Reease
  - Feb, 4, 2014 first released
  - Over 150 contributers, over 3M lines of code in 12 projects
  - Black Duck "Rookie of the Year"
  - Open NW summit SDN Idol Finalist
  - Winner INTEROP Best in SDN
  - Winner INTEROP Best in Show
  - Three editions
- Projects in the Hydrogen Release
  - Controller
  - VTN # big contribution of NEC
  - OpenDove
  - Affity Management Service
  - LISP Mapping Service
  - Yang Tools
  - Defense4All
  - OVSDB #
  - etc... all 12 projects
- RH focus on Newtron connection, OpenStack Serviece
- RH ODL Focus
  - Integraion with OS
    - ML2 ODL driver
  - Overlay NWs
  - Standards based
    - OVSDB, OpenFlow .3, OS Neutron
- Felium Release
  - 2nd release, Oct 2, 2014
  - over 200 contributors, over 4M lines of code in 21 projects
  - One Karaf "edition"
    - Feature-based configration
  - commit statistics
    - pie chart showing differnt companies contiributions
      - RH is No.2 contribution
- Helium Release
  - AAA
  - etc, 21 project
- RH focused on
  - Continued integration with OpenStack
    - ML2 ODL Driver + extentions (L3, *aaS)
  - Overlay NW management
    - Add OVSDB HW_VTEP schema support
    - Underlay informing (e.g. QoS)
  - MD-SAL
  - AAA # Authorization: keystone(OpenStack)
  - OpFlex
  - SFC (NFV context)
  - Infrastructure (testing and performance)
Optimized Data Plane
- OpenV vSwitch
  - Multi-layer virtual switch
    - OVSDB: config managed by
    - OpenFlow: flow tables controlled by
  - Provides connection between VMs on same Host
  - Provides uplink to physical NW via host NIC
  - Data fast path in-kernel
  - Cahllenges
    - kernel NW stack can be bottleneck
    - 64bytes packet processing rates suffer
    - Microflows vs. megaflows
- DPDK
  - Library for userspace packet processing
  - Diretly manages NIC with userspace poll mode driver(PMD)
  - Polls driver NIC for packets, NIC DMAs directly to application buffers
  - Platform specific optimizations
    - Hugepages, NUMA and cacheline aware
    - Batched packet processing
    - CPU instructions (SSE4, AVX, etc)
  - Challenges
    - API/ABI compatibility, difficult to package in distribution
    - Duplicate driver stacks, limited driver support
    - compile time rather than runtime optimizations
    - Currently x86-centric
    - OVS integration disables kernel features
- OVS + DPDK
  - intel reports improved packet processing rates
  - 10 times faster than OVS with kernel v-host
- ivshmem and memnic
  - ivshmem - to share memory beteween VMs
  - memnic - format shared memory segment as NIC
  - Challenges
    - ivshmem not well supported upstream QEMU
    - diables live migration
    - new driver in VM
- vhost-user
  - vhost-net allows virtio to bypass QEMU, all in kernel
  - OVS + DPDK is in userspace
  - vhost-user allows virtio to bypas QEMU, all in userspace
  - Challenges
    - performance parity w/memnic
- SR-IOV
  - capable NIC has embedded switch
  - ...
NW functions virtualization (NFV)
- NFV
  - NW functions are trappend in function specific HW
  - virtualize NW functions
  - Distribute VNFs on COTS-based IaaS - a Cloud
  - Steer traffic with SDN
- Why NFV?
  - Reduce time to market for new services
    - improve business agility
  - Reduce CAPEX and OPEX
- NFV value to Ops
- OpenNFV?
  - OSS NFV reference implememtation
  - Conforitium
    - Facilitate
    - advocate
    - Support
    - RH is platinum founding member
  - Architecture
    - OpenDaylight
    - Linux KVM
    - OVS + DPDK
    - OpenStack
All routes lead to OpenStack
- Puting it all together
- NFV OpenStack Challenging
  - Performance
  - Determinism
  - Reliablity
- NFV OpenStack performance and Determinism
  - NUMA aware cpu, mameory and IO sheduling
  - VM memory backed by hugepages
  - ...
- Reliablity
  - All infra deployed with HA
  - VM HA (non-cloud aware applicaton)
  - rich monitoring requirement
    - Fault detection, resource consumption
    - ability to monitor KPIs
- NFV OpenStack Misc
- Making NFV and OpenStack real
  - wiki.openstack NFV

【F-3】 14:40～15:20 Transform IT with RH Enterprise Linux OpenStack Platform - Red HatのOpenStack最新情報ジェフ・ジェムソン氏

Workloads are transformin again
- Traditional workloads to cloud workloads
  - Traditional workloads
    - typically resides on a single large virtual machine
    - cannot tolerate downtime
    - requires HA
    - application scales up rather than out
  - Cloud workloads
    - workload reside on multiple VM
    - tolerates Failure
- Why we are doig this?
  - Our data is too large
    - vast amount of data
    - way past the ablility of traditional system and apps
    - scaling up no longer works
  - Service requests are too large
    - more and more client devices coming online
    - much harder to maintain service to customers
  - Applications weren't written to cope with demand
- Why OpenStack?
  - Brings public cloud like apabilities in to your DC
  - provides massive on-demand (scale out) capacity
    - 1,000s -> 10,000s -> 100ks of VMs
  - It's Open
  - Community development = higer "heature volocity"
    - features and functions you ned, faster to marktet over proprietary SW
- What is OpenStack?
  - A massively scalable infra as a service platform
    - HORIZON, NOVA, GLANCE, SWIFT, NEUTRON, CINDER, HEAT, CEILOMETER, KEYSTONE
    - each is developed independently but close work together
  - Designed as modular services
  - Built for scale out architecture
- Why RH?
  - OpenStack is dependent on the underlyig Linux
    - Running on top of the Linux OS.
    - dependent on all Linux functionality
      - performance, etc... all.
  - needs access to x86 HW resources
  - Needs an operationg environment, hypervisor, other system services
  - Uses exisiting code libraries for functionality
  - and they are sure RH Enterprise Linux is trully reliable.
    - OpenStack is optimized and co-engineerd with RH Linux
- The importance of integration with RH enterprise Linux
  - A typical OpenStack cloud is made up of
    - core cloud services
      - nova, glance, swift,,,,,
    - Plugins to interact with 3rd party ....
  - Examples of RHEL optimized enablers for OpenStack
    - Virtualization
    - Security - SELinux
    - NW - SDN/OVN
    - Storage - vendor plugins, performance, thin provisioning (Ceph)
    - Ecosystem - certification of HW, Storage, and NW
  - the pariring of th linux OS and OpenStack is so close that RH is the only vendor is uniquely positioned to most effectively support functionality, performance, security, system-wide stability, and ecosystem support
- Worlds largest OpenStack partner echosystem
  - RH OpenStack Cloud infra Parter NW
    - over 235+ members
    - over 900 certified solutions in partner Marketplace
    - over 4,000 RHEL certified compute servers
    - over 13,000 applications available on RHEL
    - Large catalog of windows certified applications
- RH community leadership
  - top contributer to Juno release
    - activity.openstack.org/dash/browser
  - Proof that RH has skills, resources to
    - Support, etc...
  - wide ranging participation, contrasts with most others who are more narrowly focused
  - RH has created enterprise distribution
- service for OpenStack and Cloud
  - Training
  - Certification
  - Consulting
- Who actually using this (Customer Success) ?
  - NCI (National COmputational Infrastructure)
    - AU based company
    - Deployed RH Enterprise Linux OpenStack Platform
    - requires the security certifications RH provided
  - NANYANG TECHNOLOGICAL UNIV.
    - Deployed a hybrid cloud infra with RH Enterprise Linux OpenStack Platform
      - scalability
      - automatic resource provisining
      - saving cost (allowed better use of existing resources)
      - greater collab between agencies
- Summary
  - All benefit of community OpenStack and
    - Enterprise hardened code
    - integrated with RH Enterprise Linux
    - Enterprise SW lifecycle
    - World-wide global support
    - Partner ecosystem
    - training, certification, and consulting
    - integreted with trusted stack
      - RH CloudForms
      - RH Enterprise Viertualization
      - RH Storage (incl. Ceph)
      - Foundation for OpenShift (PaaS)
- OpenStack enables user to realize hybrid cloud. e.g. AWS + On-premise
- What analyst saying
  - RH is applying its experiece in commercializing OSS linux for the entrprise and its methodology to OpenStack
  - The company has made some smart moves in the OpenStack space and itll work out for them,,, theyve always been the OSS company
- 3 ways to get OpenStack RH
  - 90 days evaluation
  - Purcahse Supported product
    - Enterprise Linux OpenStack Platform
    - Cloud Infrastructure

【F-4】15:50～16:30 OpenStack Nova Technical Deepdive ニコラ・ディパノヴ氏

whoami
- Hacking on Nova sinse 2012
- Core reviewer since 2013
Topic covered
- Overivew of Nova deployment and services
- A look at how services communicate
- Closer look into internals of some of them (conductor and scheduler)
- Nova Objects
- Nova Cells Services
OpenStack Nova in a nutshell
- Manage cloud compute resources through a REST API
- Schedule and provision VMs
- Storage and NWing handled by other components
- VM lifecycle management (start, stop, resize, snapshot...
Nova is
- service oriented architecture
- a number of services with diffrent functions commutincating through a message bus
- system state kept in a central DB
Logical diagram (image)
- Queue
  - nova-api
  - console
  - compute
  - etc..
Services - cast of charactors
- Core: APIs, Schedular, Conductor, Compute, and maybe NW
- Helper: Console proxies, consoleauth objectstore
- Non-nova: MariaDB, RabbitMQ, memcached, libvirtd
Actual deployment image
- DC scenario
- general case, a lot of compute nodes
  - Cloud controller spread across multiple nodes
  - Compute nodes with disks
  - optional
    - monitoring, VPN, etc..
a.k.a How services communicates
- $ nova boot --image fedora --flavor 1 test
  - explaining how this works with diagram.
    - API
    - Scheduler
    - Conductor
      - Compute
      - libvirtd
- RPC and oslo.messaging
  - General purpose messaging library
  - closely maps to AMQP but not only
  - differt drivers (Rabbit, qpid, zmq)
  - supports versions (versioning dome in app code)
  - by default uses eventlet green threads for dispatching conn
  - ex. python
    - cctxt = self.client.prepare(server=host, verson=version)
    - cctxt.cast(ctxt, 'build_and_run_instance', **data)
- Scheduler
  - Filter scheduler
  - Only services that is not completely horizontaly scalable
  - desined to be non-blocking and favor quick decisions over correctness
  - in practice, can be a bottle-neck because it "learns the world" on the every request
    - actually there is a caching technique.
- Scheduler in more detail
  - opportunistic scheduling - requests can fail when capacity is low
  - simple filtering logic
  - ...
- Conductor Service
  - original idea - proxy DB access for compute nodes
  - evolved into a central orchesration service
  - horizontaly scalable
    - but one thing, DB is the bottleneck
  - plays the central part in making "Nova objects" work.
    - Nova objects
      - PRC calls are versioned but data isnt
      - Nova objects + conductor give us that (data versioning)
      - road to live upgrades
        
        upgrade conductor and DB
        
        compute nodes still use old code but conductor makes it works
        
        Future: do data migrations on the fly
      - in more detail
        
        massively simplifies dealing with database, directly or over RPC
        
        lower bar for adding new methods and data
        
        bundles data nad methods in a versioned packages
        
        ...
- Cells - more scale
  - Scale out Nova without doing DB/MQ replication
  - Each cell is a separate Nova w/o API service + a Cell serivce (own DB and MQ)
  - Parent cell runs Nova API and a cells schedular that chooses a cell
  - Inter cell comms over a separate message bus
  - in detail
    - Parent cell accespts the request and posts a meesage on the cells topic
    - One of the cells services picks up the message, runs the scheduling code, and dispatches to the chosen cell
    - most of the DB work is replicated up to the parent cell, as API code still needs to work
  - Advantages
    - Not invasive to current deployments
    - Tree structure - built for scale
    - Has real world users
      - Barkley uses
  - Downsides
    - not enough upstream testing
    - still deemed experimental
    - a number of features not supported or broken
    - no horizon support
    - Future - uncertain :(

【F-5】16:45～17:25 OpenStack Nova Deepdive Advanced ニコラ・ディパノヴ氏ふたたび

Topics
- Overview of Nova as a Python project
- Ading featurs to Nova
- Example: Scheduler
- Virt drivers
- Road to live upgrades
- Evolving the data model
Motivation for this task
- Highlight some issues that influence how the project will evolve
- Common for large OSS project
- useful for people interested in adding features
  - users if he wants can add new features to Nova
Nova - the python codebase
- very large: about 400,000 line of python code on a recent Juno release
  - Issues with a large codebase
    - no single person can be an expert
    - difficult to grow the core team
    - A lot of interactions with unstale APIs - coupling and tech debt
    - Reivews take a long time = downward spiral
Adding features upsream Problems
- not all APIs versioned
- data model changes usually not done with high enough review
- complex interactions through ill-defined APIs - edge case bugs that get missed in the review process
- Solving - a SW engineering challenge (this is challenging)
Scheduler - coupling example
- current design - opportunistic scheduleing (no locking, potential retries)
- tehis requieres the placement logic to be re-run on the compute host
- which in turn requires all data to be there
  - correct data (format, etc...
Booting - data view
- explained using diagram
- similar but data view explanation of $ nova boot ....
- showing python code
Several Problems
- A lot of the data that gets passed aroud is not versioned
- There is no standard data model
- difficult to understand the flow of data
There is hope howeever
Scheduler split a.k.a Gantt project
- Main idea - have a standalone service
- Code re-use (all project implement a scheduler)
- more scalable
- Open up the ability to do cross-project aware scheduling (Cinder, Neutron)
- define the data model first
- Current (Kilo targeted) atempts look more promissing
- RH leading the effort
Virt drivers
- Nova ships with pluggable "drivers" for several popular hypervisors
- Libvirt/KVM, Xen, Hyper-V, VMware VCenter
- Whichg driver Nova compute service will load is configurable
- Upstream gate only tests the libvirt-kvm others are tested hrough 3rd party
- Split out?
  - yet another place where devide and conquer can work
  - core team is the bottleneck
  - very few people deeply familiar with omre than one
  - a slightly more stable API
  - Can they be split out into separate repos?
  - Is there a real benefit?
Road to upgrade
- currently there is a large lockstep
  - roll the DB schema forward (downtime)
  - upgrade everything but compute nodes
- we have a functioning cloud now
  - thanks to conductor + NovaObject
- Finally - upgrade compute nodes at your own place
Road to live upgrade
- Where we want to be?
  - have only conductor serivices on the critical upgrade path
  - migrate the DB schema over time (no lockstep)
- ...
Evolving the data model
- Not the only source of problems, but a major one
- Much better now thanks to NovaObjects
- Quite performance sensistive
  - because use DB heavily when Nova did this
So in short
- Nova is large - it can cause problems
  - Slow down the project
  - Scaling perf and quality issues
  - Tech debt
- there are upstream efforts to address these issues
  - this is how OSS works and need to be considered
How can non-developers follow progress
- There is no single ans but
- follwo the nova-specs repository and relevant BPs
- find out who the key people(of course incl its developer) are
- join the Nova upstream IRC meeting (there is weekly meeting)

【F-6】17:40～18:20 Ceph loves OpenStack: Why and How 岩尾はるか氏

Ceph のアーキテクチャの説明
Ceph のあらまし
- Ceph とは？
  - OSS 分散ストレージ
  - Object とブロック両対応
  - エクサバイトを射程にいれている
  - 1,000 node 超を想定
- 歴史 - 10 年の歴史がある
  - 2004 UCSC で開発開始
  - 2014 RH が Inktank を買収
- Ceph の統合されたストレージ
  - Object Storage
    - S3 and Swift
    - multi-tenant
    - keystone
    - geo-replication
  - Block Storage
    - OpenStack
    - Clone
    - Snapshot
  - File Storage
    - POSIX
    - Linux Kernel
    - CIFS/NFS
    - HDFS
- Ceph を支えるコニュニティ
  - 306 developers
  - 475 participants
  - 1,668 discussion participants
Inktank Ceph Enterprise について
- ひとことでいうと商用版
- 略称 ICE
- Ceph + Calamari (monitoring tools, RESTful API) + setup tools and support
  - より厳しい QA
  - 長期間のサポート
- ICE のメリット
  - コストが安い
    - Ops が簡単ということもある
  - 将来性
    - 長期間のサポート
    - 単一のわかりやすい料金体系
    - ロードマップ
  - 専門性
    - Ceph の専門家
    - developer によるサポート
  - エンタープライズ READY
    - 既存インフラの活用
    - SLA つきサポート
- ICE のリリース計画
  - 3 ヶ月おきのリリース
    - アルファベット順
- ロードマップ
  - 1.2
    - RHEL 7 support
  - 2.0
    - iSCSI
    - RBD ミラーリング
Ceph のアーキテクチャ
- RADOS - LIBRADOS - RGW, RDB, CEPHFS
- RADOS
  - 信頼性のある
  - 自律的な
    - 互いに通信し、障害を検知。
  - 分散
  - オブジェクトストア
  - Ceph の中核
  - すべてのデータを RADOS に保存
  - mon と osd の2つからなる
    - OSD
      - Object Storage Daemon
      - 1 disk に 1 OSD
      - xfs/btrfs をバックエンド
      - 整合性担保と性能向上のために write ahead なジャーナルを利用
      - OSD の台数は 3- 数万
    - mon
      - monitoring daemon
      - クラスタマップとクラスタの状態の管理
      - 3, 5 まど奇数で少ない台数で運用できる
  - CRUSH アルゴリズム
    - オブジェクトの配置に用いられるアルゴリズム
    - 配置される場所は 100% 計算のみで求められる
    - なのでメタデータサーバが不要
      - SPoF はない
      - 非常によいスケーラビリティ
    - クラスタマップ
      - 階層的な OSD のマップ
        
        障害単位をまたいで複製
        
        トラフィックの集中を防止
- LIBRADOS
- RGW <-> APP
  - RADOS Gateway
  - REST base オブジェクトストアプロキシ
  - S3, Swift 互換
  - 課金のための統計情報も
- RDB <-> HOSTs (図解)
  - RBD + 仮想化
  - RBD + カーネルモジュール
  - ディスクイメージを格納
  - クラスタ全体にストライピング
  - スナップショットサポート
  - Copy on write (CoW)
  - Linux Kernel, KVM, OpenStack から利用可能
- CEPHFS
  - POSIX 互換共有ファイルシステム
  - コミュニティ版において実験的な実装が存在
  - ICE には含まれない
Ceph と OpenStack の連携
- 全体を表した図解 - みやすい
- Swift/Keystone 統合
  - 認証を統合
  - Swift 互換 API の提供が可能
- Glance 統合
  - VM の OS イメージ格納に使える
  - Glance の組み込みドライバを利用
- Cinder 統合
  - Disk イメージを RDB に格納
  - Cinder の中に組み込みドライバ
  - CoW クローンが利用可能
- NOVA/Hypervisor 統合
  - KVM にドライバを統合
  - RBD 上のボリュームを直接マウント
  - 高いパフォーマンス
    - FUSE などを使わないのでオーバーヘッドがない
  - 安定性
最後に
- Ceph What?
  - オブジェクト、ブロックそれぞれに最適化したアーキテクチャをもつ
  - 単一のストレージプール
    - 高いディスク使用効率
  - OpenStack の各コンポーネントに組み込みのドライバサポート
- Why Ceph loves OpenStack?
  - 組み込みドライバによる高性能、安定性
  - CoW によるクローンスナップショットのサポート
  - 大きなコミュニティ
  - RH は Ceph, OpenStack 両方の最大の貢献者
  - NFS と同様に広く使われているという調査結果 (OpenStack ユーザ調査より)
    - OpenStack のユーザということが肝
- Ceph の優位性
  - エクサバイトが視野
  - OpenStack と深い統合 (native 実装はすごく大事
  - 広く活発なユーザコミュニティ
- ICE のまとめ
  - Ceph の利点をエンタープライズで
    - よりながいライフサイクル
    - インストーラー提供
    - コンサルティング
  - Clamari の統合
  - Hotfix 提供、ロードマップへの反映
  - 日本でも提供
    - サポートは英語 (現時点)
- RHEL-OSP と ICE
  - RH が持つ 2 つの製品
  - OpenStack とそのストレージに対する単一のソリューション
  - サポート、コンサルティングをワンストップで提供