#garagekidztweetz

id:garage-kid@76whizkidz のライフログ・ブログ!

Redhat Forum 2014 に #OpenStack の話をメインで聴きにいってきた(午後のセッション編) #redhatforum

スポンサーリンク

f:id:garage-kid:20141013105205p:plain

はい、では 前編(基調講演編)からの続きです。

午後のセッションに関しても、あえて雑感を書いておくと、

  • Nova はものすごくきな臭い感じがした(【F-3】、【F-4】を聴いた結果)、ってことと、、、、
  • OpenStack の話を聞きに来たつもりだったけど、 Ceph について話していた最後のセッション”【F-6】Ceph loves OpenStack” が一番丁寧でわかりやすいセッションだった
    • 日本人ってやっぱり説明が丁寧だよなぁ、と。( RedHat Forum はほとんどのセッションが本国、米のスピーカーだったので、その方々と比較してってことですね)

ちなみに、わたしが聴講してきたのは、以下の 5 セッションです。

  • 【F-1】12:50~13:30 OpenStack, OpenDaylight and OPNEV - OpenStackとNFVの関係。OpenDaylight プロジェクト最新情報 クリス・ライト氏
  • 【F-3】14:40~15:20 Transform IT with RH Enterprise Linux OpenStack Platform - Red HatのOpenStack最新情報 ジェフ・ジェムソン氏
  • 【F-4】15:50~16:30 OpenStack Nova Technical Deepdive ニコラ・ディパノヴ氏
  • 【F-5】16:45~17:25 OpenStack Nova Deepdive Advanced ニコラ・ディパノヴ氏ふたたび
  • 【F-6】17:40~18:20 Ceph loves OpenStack: Why and How 岩尾はるか氏

では、以降より各セッションでわたしがとってきたメモです。

【F-1】12:50~13:30 OpenStack, OpenDaylight and OPNEV - OpenStackとNFVの関係。OpenDaylight プロジェクト最新情報 クリス・ライト氏

  • Agenda - emarging technology story.
  • SW defined Networking (SDN)
    • SDN and NW virtualization
      • SDN is many things to many people
        • separation of control plane and data plane
        • programmatic IF for NW control
      • NW virtualiztion
        • Decoupling ligical (overlay) NW topology from physical (underlay) topology
      • RH focus
        • virtual NWs defined using OpenStack NW service (Neutron)
        • VXLAN overlay for decoupling and scalability
        • Layer 2-7
    • OpenDaylight?
      • OSS SDN
        • open
        • transparent
        • merit-based
      • Consortium
        • facilitate
        • advocate
        • support
        • RH is platinum founding member
    • OpenDaylight SDN Platform
      • Modular, extensible pluggable
      • Java, OSCi, Karaf based platform
      • Evolving towards model driven using YANG
      • Multi-protocol
        • Openflow + other protocol
      • Eclipse Public Licence
    • OpenDaylight SDN Platform
      • image
    • Hydrogen Reease
      • Feb, 4, 2014 first released
      • Over 150 contributers, over 3M lines of code in 12 projects
      • Black Duck "Rookie of the Year"
      • Open NW summit SDN Idol Finalist
      • Winner INTEROP Best in SDN
      • Winner INTEROP Best in Show
      • Three editions
    • Projects in the Hydrogen Release
      • Controller
      • VTN # big contribution of NEC
      • OpenDove
      • Affity Management Service
      • LISP Mapping Service
      • Yang Tools
      • Defense4All
      • OVSDB #
      • etc... all 12 projects
    • RH focus on Newtron connection, OpenStack Serviece
    • RH ODL Focus
      • Integraion with OS
        • ML2 ODL driver
      • Overlay NWs
      • Standards based
        • OVSDB, OpenFlow .3, OS Neutron
    • Felium Release
      • 2nd release, Oct 2, 2014
      • over 200 contributors, over 4M lines of code in 21 projects
      • One Karaf "edition"
        • Feature-based configration
      • commit statistics
        • pie chart showing differnt companies contiributions
          • RH is No.2 contribution
    • Helium Release
      • AAA
      • etc, 21 project
    • RH focused on
      • Continued integration with OpenStack
        • ML2 ODL Driver + extentions (L3, *aaS)
      • Overlay NW management
        • Add OVSDB HW_VTEP schema support
        • Underlay informing (e.g. QoS)
      • MD-SAL
      • AAA # Authorization: keystone(OpenStack)
      • OpFlex
      • SFC (NFV context)
      • Infrastructure (testing and performance)
  • Optimized Data Plane
    • OpenV vSwitch
      • Multi-layer virtual switch
        • OVSDB: config managed by
        • OpenFlow: flow tables controlled by
      • Provides connection between VMs on same Host
      • Provides uplink to physical NW via host NIC
      • Data fast path in-kernel
      • Cahllenges
        • kernel NW stack can be bottleneck
        • 64bytes packet processing rates suffer
        • Microflows vs. megaflows
    • DPDK
      • Library for userspace packet processing
      • Diretly manages NIC with userspace poll mode driver(PMD)
      • Polls driver NIC for packets, NIC DMAs directly to application buffers
      • Platform specific optimizations
        • Hugepages, NUMA and cacheline aware
        • Batched packet processing
        • CPU instructions (SSE4, AVX, etc)
      • Challenges
        • API/ABI compatibility, difficult to package in distribution
        • Duplicate driver stacks, limited driver support
        • compile time rather than runtime optimizations
        • Currently x86-centric
        • OVS integration disables kernel features
    • OVS + DPDK
      • intel reports improved packet processing rates
      • 10 times faster than OVS with kernel v-host
    • ivshmem and memnic
      • ivshmem - to share memory beteween VMs
      • memnic - format shared memory segment as NIC
      • Challenges
        • ivshmem not well supported upstream QEMU
        • diables live migration
        • new driver in VM
    • vhost-user
      • vhost-net allows virtio to bypass QEMU, all in kernel
      • OVS + DPDK is in userspace
      • vhost-user allows virtio to bypas QEMU, all in userspace
      • Challenges
        • performance parity w/memnic
    • SR-IOV
      • capable NIC has embedded switch
      • ...
  • NW functions virtualization (NFV)
    • NFV
      • NW functions are trappend in function specific HW
      • virtualize NW functions
      • Distribute VNFs on COTS-based IaaS - a Cloud
      • Steer traffic with SDN
    • Why NFV?
      • Reduce time to market for new services
        • improve business agility
      • Reduce CAPEX and OPEX
    • NFV value to Ops
    • OpenNFV?
      • OSS NFV reference implememtation
      • Conforitium
        • Facilitate
        • advocate
        • Support
        • RH is platinum founding member
      • Architecture
        • OpenDaylight
        • Linux KVM
        • OVS + DPDK
        • OpenStack
  • All routes lead to OpenStack
    • Puting it all together
    • NFV OpenStack Challenging
      • Performance
      • Determinism
      • Reliablity
    • NFV OpenStack performance and Determinism
      • NUMA aware cpu, mameory and IO sheduling
      • VM memory backed by hugepages
      • ...
    • Reliablity
      • All infra deployed with HA
      • VM HA (non-cloud aware applicaton)
      • rich monitoring requirement
        • Fault detection, resource consumption
        • ability to monitor KPIs
    • NFV OpenStack Misc
    • Making NFV and OpenStack real
      • wiki.openstack NFV

【F-3】 14:40~15:20 Transform IT with RH Enterprise Linux OpenStack Platform - Red HatのOpenStack最新情報 ジェフ・ジェムソン氏

  • Workloads are transformin again
    • Traditional workloads to cloud workloads
      • Traditional workloads
        • typically resides on a single large virtual machine
        • cannot tolerate downtime
        • requires HA
        • application scales up rather than out
      • Cloud workloads
        • workload reside on multiple VM
        • tolerates Failure
    • Why we are doig this?
      • Our data is too large
        • vast amount of data
        • way past the ablility of traditional system and apps
        • scaling up no longer works
      • Service requests are too large
        • more and more client devices coming online
        • much harder to maintain service to customers
      • Applications weren't written to cope with demand
    • Why OpenStack?
      • Brings public cloud like apabilities in to your DC
      • provides massive on-demand (scale out) capacity
        • 1,000s -> 10,000s -> 100ks of VMs
      • It's Open
      • Community development = higer "heature volocity"
        • features and functions you ned, faster to marktet over proprietary SW
    • What is OpenStack?
      • A massively scalable infra as a service platform
        • HORIZON, NOVA, GLANCE, SWIFT, NEUTRON, CINDER, HEAT, CEILOMETER, KEYSTONE
        • each is developed independently but close work together
      • Designed as modular services
      • Built for scale out architecture
    • Why RH?
      • OpenStack is dependent on the underlyig Linux
        • Running on top of the Linux OS.
        • dependent on all Linux functionality
          • performance, etc... all.
      • needs access to x86 HW resources
      • Needs an operationg environment, hypervisor, other system services
      • Uses exisiting code libraries for functionality
      • and they are sure RH Enterprise Linux is trully reliable.
        • OpenStack is optimized and co-engineerd with RH Linux
    • The importance of integration with RH enterprise Linux
      • A typical OpenStack cloud is made up of
        • core cloud services
          • nova, glance, swift,,,,,
        • Plugins to interact with 3rd party ....
      • Examples of RHEL optimized enablers for OpenStack
        • Virtualization
        • Security - SELinux
        • NW - SDN/OVN
        • Storage - vendor plugins, performance, thin provisioning (Ceph)
        • Ecosystem - certification of HW, Storage, and NW
      • the pariring of th linux OS and OpenStack is so close that RH is the only vendor is uniquely positioned to most effectively support functionality, performance, security, system-wide stability, and ecosystem support
    • Worlds largest OpenStack partner echosystem
      • RH OpenStack Cloud infra Parter NW
        • over 235+ members
        • over 900 certified solutions in partner Marketplace
        • over 4,000 RHEL certified compute servers
        • over 13,000 applications available on RHEL
        • Large catalog of windows certified applications
    • RH community leadership
      • top contributer to Juno release
        • activity.openstack.org/dash/browser
      • Proof that RH has skills, resources to
        • Support, etc...
      • wide ranging participation, contrasts with most others who are more narrowly focused
      • RH has created enterprise distribution
    • service for OpenStack and Cloud
      • Training
      • Certification
      • Consulting
    • Who actually using this (Customer Success) ?
      • NCI (National COmputational Infrastructure)
        • AU based company
        • Deployed RH Enterprise Linux OpenStack Platform
        • requires the security certifications RH provided
      • NANYANG TECHNOLOGICAL UNIV.
        • Deployed a hybrid cloud infra with RH Enterprise Linux OpenStack Platform
          • scalability
          • automatic resource provisining
          • saving cost (allowed better use of existing resources)
          • greater collab between agencies
    • Summary
      • All benefit of community OpenStack and
        • Enterprise hardened code
        • integrated with RH Enterprise Linux
        • Enterprise SW lifecycle
        • World-wide global support
        • Partner ecosystem
        • training, certification, and consulting
        • integreted with trusted stack
          • RH CloudForms
          • RH Enterprise Viertualization
          • RH Storage (incl. Ceph)
          • Foundation for OpenShift (PaaS)
    • OpenStack enables user to realize hybrid cloud. e.g. AWS + On-premise
    • What analyst saying
      • RH is applying its experiece in commercializing OSS linux for the entrprise and its methodology to OpenStack
      • The company has made some smart moves in the OpenStack space and itll work out for them,,, theyve always been the OSS company
    • 3 ways to get OpenStack RH
      • 90 days evaluation
      • Purcahse Supported product
        • Enterprise Linux OpenStack Platform
        • Cloud Infrastructure

【F-4】15:50~16:30 OpenStack Nova Technical Deepdive ニコラ・ディパノヴ氏

  • whoami
    • Hacking on Nova sinse 2012
    • Core reviewer since 2013
  • Topic covered
    • Overivew of Nova deployment and services
    • A look at how services communicate
    • Closer look into internals of some of them (conductor and scheduler)
    • Nova Objects
    • Nova Cells Services
  • OpenStack Nova in a nutshell
    • Manage cloud compute resources through a REST API
    • Schedule and provision VMs
    • Storage and NWing handled by other components
    • VM lifecycle management (start, stop, resize, snapshot...
  • Nova is
    • service oriented architecture
    • a number of services with diffrent functions commutincating through a message bus
    • system state kept in a central DB
  • Logical diagram (image)
    • Queue
      • nova-api
      • console
      • compute
      • etc..
  • Services - cast of charactors
    • Core: APIs, Schedular, Conductor, Compute, and maybe NW
    • Helper: Console proxies, consoleauth objectstore
    • Non-nova: MariaDB, RabbitMQ, memcached, libvirtd
  • Actual deployment image
    • DC scenario
    • general case, a lot of compute nodes
      • Cloud controller spread across multiple nodes
      • Compute nodes with disks
      • optional
        • monitoring, VPN, etc..
  • a.k.a How services communicates
    • $ nova boot --image fedora --flavor 1 test
      • explaining how this works with diagram.
        • API
        • Scheduler
        • Conductor
          • Compute
          • libvirtd
    • RPC and oslo.messaging
      • General purpose messaging library
      • closely maps to AMQP but not only
      • differt drivers (Rabbit, qpid, zmq)
      • supports versions (versioning dome in app code)
      • by default uses eventlet green threads for dispatching conn
      • ex. python
        • cctxt = self.client.prepare(server=host, verson=version)
        • cctxt.cast(ctxt, 'build_and_run_instance', **data)
    • Scheduler
      • Filter scheduler
      • Only services that is not completely horizontaly scalable
      • desined to be non-blocking and favor quick decisions over correctness
      • in practice, can be a bottle-neck because it "learns the world" on the every request
        • actually there is a caching technique.
    • Scheduler in more detail
      • opportunistic scheduling - requests can fail when capacity is low
      • simple filtering logic
      • ...
    • Conductor Service
      • original idea - proxy DB access for compute nodes
      • evolved into a central orchesration service
      • horizontaly scalable
        • but one thing, DB is the bottleneck
      • plays the central part in making "Nova objects" work.
        • Nova objects
          • PRC calls are versioned but data isnt
          • Nova objects + conductor give us that (data versioning)
          • road to live upgrades
            • upgrade conductor and DB
            • compute nodes still use old code but conductor makes it works
            • Future: do data migrations on the fly
          • in more detail
            • massively simplifies dealing with database, directly or over RPC
            • lower bar for adding new methods and data
            • bundles data nad methods in a versioned packages
            • ...
    • Cells - more scale
      • Scale out Nova without doing DB/MQ replication
      • Each cell is a separate Nova w/o API service + a Cell serivce (own DB and MQ)
      • Parent cell runs Nova API and a cells schedular that chooses a cell
      • Inter cell comms over a separate message bus
      • in detail
        • Parent cell accespts the request and posts a meesage on the cells topic
        • One of the cells services picks up the message, runs the scheduling code, and dispatches to the chosen cell
        • most of the DB work is replicated up to the parent cell, as API code still needs to work
      • Advantages
        • Not invasive to current deployments
        • Tree structure - built for scale
        • Has real world users
          • Barkley uses
      • Downsides
        • not enough upstream testing
        • still deemed experimental
        • a number of features not supported or broken
        • no horizon support
        • Future - uncertain :(

【F-5】16:45~17:25 OpenStack Nova Deepdive Advanced ニコラ・ディパノヴ氏ふたたび

  • Topics
    • Overview of Nova as a Python project
    • Ading featurs to Nova
    • Example: Scheduler
    • Virt drivers
    • Road to live upgrades
    • Evolving the data model
  • Motivation for this task
    • Highlight some issues that influence how the project will evolve
    • Common for large OSS project
    • useful for people interested in adding features
      • users if he wants can add new features to Nova
  • Nova - the python codebase
    • very large: about 400,000 line of python code on a recent Juno release
      • Issues with a large codebase
        • no single person can be an expert
        • difficult to grow the core team
        • A lot of interactions with unstale APIs - coupling and tech debt
        • Reivews take a long time = downward spiral
  • Adding features upsream Problems
    • not all APIs versioned
    • data model changes usually not done with high enough review
    • complex interactions through ill-defined APIs - edge case bugs that get missed in the review process
    • Solving - a SW engineering challenge (this is challenging)
  • Scheduler - coupling example
    • current design - opportunistic scheduleing (no locking, potential retries)
    • tehis requieres the placement logic to be re-run on the compute host
    • which in turn requires all data to be there
      • correct data (format, etc...
  • Booting - data view
    • explained using diagram
    • similar but data view explanation of $ nova boot ....
    • showing python code
  • Several Problems
    • A lot of the data that gets passed aroud is not versioned
    • There is no standard data model
    • difficult to understand the flow of data
  • There is hope howeever
  • Scheduler split a.k.a Gantt project
    • Main idea - have a standalone service
    • Code re-use (all project implement a scheduler)
    • more scalable
    • Open up the ability to do cross-project aware scheduling (Cinder, Neutron)
    • define the data model first
    • Current (Kilo targeted) atempts look more promissing
    • RH leading the effort
  • Virt drivers
    • Nova ships with pluggable "drivers" for several popular hypervisors
    • Libvirt/KVM, Xen, Hyper-V, VMware VCenter
    • Whichg driver Nova compute service will load is configurable
    • Upstream gate only tests the libvirt-kvm others are tested hrough 3rd party
    • Split out?
      • yet another place where devide and conquer can work
      • core team is the bottleneck
      • very few people deeply familiar with omre than one
      • a slightly more stable API
      • Can they be split out into separate repos?
      • Is there a real benefit?
  • Road to upgrade
    • currently there is a large lockstep
      • roll the DB schema forward (downtime)
      • upgrade everything but compute nodes
    • we have a functioning cloud now
      • thanks to conductor + NovaObject
    • Finally - upgrade compute nodes at your own place
  • Road to live upgrade
    • Where we want to be?
      • have only conductor serivices on the critical upgrade path
      • migrate the DB schema over time (no lockstep)
    • ...
  • Evolving the data model
    • Not the only source of problems, but a major one
    • Much better now thanks to NovaObjects
    • Quite performance sensistive
      • because use DB heavily when Nova did this
  • So in short
    • Nova is large - it can cause problems
      • Slow down the project
      • Scaling perf and quality issues
      • Tech debt
    • there are upstream efforts to address these issues
      • this is how OSS works and need to be considered
  • How can non-developers follow progress
    • There is no single ans but
    • follwo the nova-specs repository and relevant BPs
    • find out who the key people(of course incl its developer) are
    • join the Nova upstream IRC meeting (there is weekly meeting)

【F-6】17:40~18:20 Ceph loves OpenStack: Why and How 岩尾はるか氏

  • Ceph のアーキテクチャの説明
  • Ceph のあらまし
    • Ceph とは?
      • OSS 分散ストレージ
      • Object とブロック両対応
      • エクサバイトを射程にいれている
      • 1,000 node 超を想定
    • 歴史 - 10 年の歴史がある
      • 2004 UCSC で開発開始
      • 2014 RH が Inktank を買収
    • Ceph の統合されたストレージ
      • Object Storage
        • S3 and Swift
        • multi-tenant
        • keystone
        • geo-replication
      • Block Storage
        • OpenStack
        • Clone
        • Snapshot
      • File Storage
        • POSIX
        • Linux Kernel
        • CIFS/NFS
        • HDFS
    • Ceph を支えるコニュニティ
      • 306 developers
      • 475 participants
      • 1,668 discussion participants
  • Inktank Ceph Enterprise について
    • ひとことでいうと商用版
    • 略称 ICE
    • Ceph + Calamari (monitoring tools, RESTful API) + setup tools and support
      • より厳しい QA
      • 長期間のサポート
    • ICE のメリット
      • コストが安い
        • Ops が簡単ということもある
      • 将来性
        • 長期間のサポート
        • 単一のわかりやすい料金体系
        • ロードマップ
      • 専門性
        • Ceph の専門家
        • developer によるサポート
      • エンタープライズ READY
        • 既存インフラの活用
        • SLA つきサポート
    • ICE のリリース計画
      • 3 ヶ月おきのリリース
        • アルファベット順
    • ロードマップ
      • 1.2
        • RHEL 7 support
      • 2.0
        • iSCSI
        • RBD ミラーリング
  • Ceph のアーキテクチャ
    • RADOS - LIBRADOS - RGW, RDB, CEPHFS
    • RADOS
      • 信頼性のある
      • 自律的な
        • 互いに通信し、障害を検知。
      • 分散
      • オブジェクトストア
      • Ceph の中核
      • すべてのデータを RADOS に保存
      • mon と osd の2つからなる
        • OSD
          • Object Storage Daemon
          • 1 disk に 1 OSD
          • xfs/btrfs をバックエンド
          • 整合性担保と性能向上のために write ahead なジャーナルを利用
          • OSD の台数は 3- 数万
        • mon
          • monitoring daemon
          • クラスタマップとクラスタの状態の管理
          • 3, 5 まど奇数で少ない台数で運用できる
      • CRUSH アルゴリズム
        • オブジェクトの配置に用いられるアルゴリズム
        • 配置される場所は 100% 計算のみで求められる
        • なのでメタデータサーバが不要
          • SPoF はない
          • 非常によいスケーラビリティ
        • クラスタマップ
          • 階層的な OSD のマップ
            • 障害単位をまたいで複製
            • トラフィックの集中を防止
    • LIBRADOS
    • RGW <-> APP
      • RADOS Gateway
      • REST base オブジェクトストアプロキシ
      • S3, Swift 互換
      • 課金のための統計情報も
    • RDB <-> HOSTs (図解)
      • RBD + 仮想化
      • RBD + カーネルモジュール
      • ディスクイメージを格納
      • クラスタ全体にストライピング
      • スナップショットサポート
      • Copy on write (CoW)
      • Linux Kernel, KVM, OpenStack から利用可能
    • CEPHFS
      • POSIX 互換共有ファイルシステム
      • コミュニティ版において実験的な実装が存在
      • ICE には含まれない
  • Ceph と OpenStack の連携
    • 全体を表した図解 - みやすい
    • Swift/Keystone 統合
      • 認証を統合
      • Swift 互換 API の提供が可能
    • Glance 統合
      • VM の OS イメージ格納に使える
      • Glance の組み込みドライバを利用
    • Cinder 統合
      • Disk イメージを RDB に格納
      • Cinder の中に組み込みドライバ
      • CoW クローンが利用可能
    • NOVA/Hypervisor 統合
      • KVM にドライバを統合
      • RBD 上のボリュームを直接マウント
      • 高いパフォーマンス
        • FUSE などを使わないのでオーバーヘッドがない
      • 安定性
  • 最後に
    • Ceph What?
      • オブジェクト、ブロックそれぞれに最適化したアーキテクチャをもつ
      • 単一のストレージプール
        • 高いディスク使用効率
      • OpenStack の各コンポーネントに組み込みのドライバサポート
    • Why Ceph loves OpenStack?
      • 組み込みドライバによる高性能、安定性
      • CoW によるクローンスナップショットのサポート
      • 大きなコミュニティ
      • RH は Ceph, OpenStack 両方の最大の貢献者
      • NFS と同様に広く使われているという調査結果 (OpenStack ユーザ調査より)
        • OpenStack のユーザということが肝
    • Ceph の優位性
      • エクサバイトが視野
      • OpenStack と深い統合 (native 実装はすごく大事
      • 広く活発なユーザコミュニティ
    • ICE のまとめ
      • Ceph の利点をエンタープライズで
        • よりながいライフサイクル
        • インストーラー提供
        • コンサルティング
      • Clamari の統合
      • Hotfix 提供、ロードマップへの反映
      • 日本でも提供
        • サポートは英語 (現時点)
    • RHEL-OSP と ICE
      • RH が持つ 2 つの製品
      • OpenStack とそのストレージに対する単一のソリューション
      • サポート、コンサルティングをワンストップで提供

と、メモはここまでってことで、今回は以上です。

あわせて読まれたい