#garagekidztweetz

読者です 読者をやめる 読者になる 読者になる

#garagekidztweetz

id:garage-kid@76whizkidz のライフログ・ブログ!

#dbts2015 Keynote: Being the father of data warehouse のメモ

dbts conference lifelog

2015-06-10 に開催された db tech show case 初日に参加してきたので、参加したセッションのメモを公開していきます。

最初は DWH の父と称される Bill Inmon 氏の Keynote のメモから。

Bill Inmon 氏、 御年 69 歳とのことでしたが、明瞭かつ丁寧に、そしてゆっくりとした英語で話してくださったので、大変わかりやすい Keynote でした。

話自体に目新しさはありませんでしたが、 DWH を提唱したご本人の言葉を生で聞くことができたので大変意味深い経験をさせていただきました。

またセッションのあとで、サインをお願いしたんですが、気さくに応じて下さいました。

お礼は、著作を買うことで返そうと思ったんですが、セッション中で Amazon で自分の著作はリーズナブルに買えるよとおっしゃってたわりには、、、、 Wikipedia をみるかぎり直近の著作と思われる Data Architecture: A Primer for the Data Scientist: Big Data, Data Warehouse and Data Vault って結構いい値段するんですよね。。。お、おカネに余裕があるときにってことで、、、

では、以下から、メモです。

  • How in the world did anyone ever become the “father of data warehouse”
    • thinking outside the box
    • understanding data at an architectural level
    • looking for business value and relating that to technology
    • publishing your thoughts
    • truly original session for this event.
  • What is a Data Warehouse?
    • Topic: What is going on for the data architecture technology.
      • Data architecture.
        • important to you and your company.
          • important because of what discussed on here is influence to you and your company for the future.
        • The presentation will be available after this presentation.
          • so you can catch up that after the session.
        • Tons of books available on amazon with reasonable price.
      • Discussion of evolution of the data architecture (information architecture).
        • Evolution will never end.
    • Picture of Spider web system architecture.
      • Problem: No(lack) integrity of the data.
        • Same data though on another place the data is sometimes different.
        • Multiple data occurrence make happened this.
        • To solve this problem DWH was created()
          • DSS based DWH
          • Operational Transaction based systems
      • Appl
        • processing data and store it into database.
          • simple application (inventory management something like that)
          • realtime transaction (OLTP)
            • so useful could use most of the applications.
      • Personal Computer
        • 4GL
        • Get the data from each app and pull them into each Personal computers.
    • Data Warehouse
      • A single source of truth for data.
      • Characteristics of DWh
        • subject oriented
          • organise data for subject not for processes.
            • subject like customer, product, transaction and etc...
        • Integrated
          • When put data into database it should be integrated.
            • e.g. for the data for gender, integrated it to one specification like Male for M, Female for F something like that.
        • Nonvolatile
          • Permanency.
            • On DWH environment there is no data changes.
              • when we want to change the data we need to create new record. (historical record)
              • beside transaction database data is always changing when the status was updated.
        • Time variant
          • Accuracy.
      • Collection of data for managements decisions
    • Data warehouse basis
      • data like the grains of sand. can use for everything (e.g. marketing)
      • DWH data is
        • integrated
          • we dont need several kind of interpreted data.
        • detailed
        • historical
          • transaction database get rid of this because of the performance.
          • processing only current data.
          • primary reason for keeping historical data on DWH is to understand when we start thinking about something.
            • e.g. when we start to thinking about customer.
              • when he start to go to school.
              • when he graduate school. etc.
              • and from those data we can predict his future.
    • But a DWH by itself
      • skipped...
    • Soon an infra was built around the data warehouse
      • applications ->
      • ETL ->
      • DWH
        • Data marts (reshape the data for their needs)
          • -> finance
          • -> sales
          • -> marketing
    • Corporate information factory (CIF)
      • most of the organisation end up with this architecture.
        • very common now.
    • But soon the volumes of data began to grow.
      • you will discover DWH grow much larger and larger
    • And it was noticed that data was being devided into either Heavily used data or lightly used data
      • if your DWH have 5y long data, 95% of the query will run through the last 3 months of data.
        • 3month old
          • actively used data
        • 5 years old
          • Dormant data
      • like oil and water.
        • oil rise to the top and water brings down.
        • so what this will happened the data grow larger.
    • Data is devided along the lines of
      • volume
      • probablity of access
      • type of data being stored
      • age
    • DW2.0 (picture)
      • Enterprise Metadata Repository
      • Master Data
        • vertically sliced
          • Structural data
          • Unstructural data
            • e.g. Email
            • so the evolution of DWH2.0 is put this unstructural data into DWH.
              • because now, most cooperation and most DWH look into structural data.
        • horizontally sliced
          • Interactive
            • Very current
          • Integrated
            • Current++
          • Near line
            • Less than current
          • Archival
            • Older
    • Bigdata and Hadoop
      • everybody need to know this.
      • in U.S who dedicated for this technology earn quoter Million $ a year.
  • QA.
    • What did you do before becoming Father of DWH.
      • professional golfer???? ww
        • リー・トレビノ
          • good person you would like him.
      • DBA
    • Reason for DBA?
      • Highest pay professional. 2nd is DWH people.
        • a lot of place to work with.
        • 1st is P.h.d people.
    • What is the most of the happiest scene becoming Father of DWH.
      • not same to become the father of the child.
      • difference between them is living individual in the world.
    • There is no question for HW become more powerful.
      • but on the other hand, it is the contest, the more data become larger wins HW powerful.
    • what do you think about Data lake?
      • Data lake, there is No architecture at all.
      • no circumstances storing all data into DWH.
    • Hardest project.
      • 50 Billion documents
        • expensive
        • lack of accuracy
    • major in Math, why switched to Computer science.
      • when he was in his collage, there is still no computer science.
    • Future of the DB.
      • Attend my rest 2 sessions ;)
        • Actually now I have interested in
          • what to do about Bigdata
          • how do you handle textual unstructured data
    • Preferred DWH product?
      • Hard to adress it.
        • it depends on what the purpose of your usage of DWH.
      • for mid size
        • Oralce
      • for big size
        • Teradata
      • But I cant say particular one product is good for the DWH.
  • Ref.
    • Tech Target (source of information worthy to catch up with)

Slides.

Seems still not available on the website of this event. When I found it, will put the link around here.

あわせて読まれたい