view · edit · sidebar · attach · print · history

20101202-update-outline-spreadsheet

<< Masa.20101203-update-outline-spreadsheet | 2010 | Masa.20101201-update-bbmb-ch-update-spreadsheet >>


  1. Make a sample - from one with outline to one without outline
  2. Analyze Reader and Writer classes suspend
  3. Trace write_colinfo method

Goal
  • Check outline function spreadsheet / 80%
Milestones
  1. Review
  2. test writing (making) a xls file through spreadsheet suspend
  3. Read Reader and Writer classes suspend
  4. Trace write_colinfo
Summary
Commits
ToDo Tomorrow
Keep in Mind
  1. todos
  2. BAG: Files published obviously only on the first ch.ODDB.org Report - Error: Generikaliste - 12/2010 log passwd
  3. Testcase of 'Object Stream' parser (for rpdf2txt/lib/rpdf2txt/parser.rb#rebuild_object_catalogue method) 20101126-update-rpdf2txt
  4. Testcases of lib/oddb/html/state/global.rb#grant_download, lib/oddb/html/view/download.rb#to_html
  5. Debug testcases in test/export/test_server.rb de.oddb.org
  6. A bug import_gkv Tue Nov 16 02:00:10 2010: de.oddb.org Zubef (PDF)
  7. Compression (refer to lib/oddb/export/server.rb), Test cases (grant_download, Logging, Reporting)
  8. Log Error: on production server, de.oddb.org/log/import_dimdi, import_pharmnet
  9. On Ice
  10. emerge --sync

Make a sample - from one with outline to one without outline

Memo

  • If I save a xls file made by MS Office by Open Office, the file size changes
masa@masa ~/ywesee/spreadsheet $ ls -al
-rwxr--r--  1 masa masa 23552  2. Dez 08:20 test2.xls

masa@masa ~/ywesee/spreadsheet $ of test2.xls

masa@masa ~/ywesee/spreadsheet $ ls -al
-rwxr--r--  1 masa masa  6656  2. Dez 08:22 test2.xls

Note

  • The size becomes 1/4

Memo

  • A file (.xlsx) saved by Office 2010 is not recognized by spreadsheet
masa@masa ~/ywesee/spreadsheet $ ruby -I lib test.rb test5.xlsx 
/usr/lib64/ruby/gems/1.8/gems/ruby-ole-1.2.10.1/lib/ole/storage/base.rb:376:in `validate!': OLE2 signature is invalid (Ole::Storage::FormatError)
        from /usr/lib64/ruby/gems/1.8/gems/ruby-ole-1.2.10.1/lib/ole/storage/base.rb:368:in `initialize'
        from /usr/lib64/ruby/gems/1.8/gems/ruby-ole-1.2.10.1/lib/ole/storage/base.rb:110:in `new'
        from /usr/lib64/ruby/gems/1.8/gems/ruby-ole-1.2.10.1/lib/ole/storage/base.rb:110:in `load'
        from /usr/lib64/ruby/gems/1.8/gems/ruby-ole-1.2.10.1/lib/ole/storage/base.rb:77:in `initialize'
        from /usr/lib64/ruby/gems/1.8/gems/ruby-ole-1.2.10.1/lib/ole/storage/base.rb:83:in `new'
        from /usr/lib64/ruby/gems/1.8/gems/ruby-ole-1.2.10.1/lib/ole/storage/base.rb:83:in `open'
        from ./lib/spreadsheet/excel/reader.rb:1139:in `setup'
        from ./lib/spreadsheet/excel/reader.rb:122:in `read'
        from ./lib/spreadsheet/excel/workbook.rb:32:in `open'
        from ./lib/spreadsheet.rb:62:in `open'
        from ./lib/spreadsheet.rb:68:in `open'
        from test.rb:5

Experiment

require 'spreadsheet'


file = ARGV[0]
book = Spreadsheet.open(file, 'rb')
sheet= book.worksheet(0)

5.times do |i|
  print i," ", sheet.column(i).width, "\n"
end
puts
sheet.column(0).width = 100
5.times do |i|
  print i," ", sheet.column(i).width, "\n"
end

book.write "aaa.xls"

Result

masa@masa ~/ywesee/spreadsheet $ ruby -I lib test.rb test2.xls 
0 10
1 10
2 0.0
3 0.0
4 11.42578125

0 100
1 10
2 0.0
3 0.0
4 11.42578125

Notes

  • The value of the Column object changed, but the actual width of column in the xls file does not change
  • 'width=0.0' means 'hidden'

Consideration

  • Even if we change the property of Column class, the value is not reflected to the workbook object
  • I have to call some method in order to apply the value to the current workbook object

Experiment

require 'spreadsheet'

file = ARGV[0]
book = Spreadsheet.open(file, 'rb')
sheet= book.worksheet(0)

#sheet.column(0)[0] = "abc"  #=> error
sheet[0,0]="abc"
book.write "out1.xls"
sheet.column(10000).worksheet[0,0] = "abc"
book.write "out2.xls"

Result

masa@masa ~/ywesee/spreadsheet $ ruby -I lib test.rb test2.xls 
masa@masa ~/ywesee/spreadsheet $ diff out1.xls out2.xls 

Notes

  • It is impossible to rewrite values in a xls file through Column class
  • because Column class gets actual values from Worksheet instance object
    • The column class looks like a reader class (we cannot change values through Column class)
  • It is possible to rewrite a value of xls file via Worksheet object
  • It seems that the Worksheet object has actual values for the xls file

Memo

  • There are two 'Worksheet' classes, Spreadsheet::Worksheet and Spreadsheet::Excel::Worksheet
  • But Spreadsheet::Excel::Worksheet inherits Spreadsheet::Worksheet

lib/spreadsheet/excel/worksheet.rb

module Spreadsheet
  module Excel
##
# Excel-specific Worksheet methods. These are mostly pertinent to the Excel
# reader, and to recording changes to the Worksheet. You should have no reason
# to use any of these.
class Worksheet < Spreadsheet::Worksheet

Next

  • Probably I have to understand byte structure in order to rewrite a xls file, or
  • in order to update Writer class
  • Most of the methods of Writer class docode data into byte code

Question

  • Where (How) does the Writer class write the Column format data?

Analyze Reader and Writer classes

Problems

  • There are so many similar (same) names in methods and classes
  • For example, write method, Workbook class, Worksheet class, etc.

Search which 'write' method is called

Experiment

grep seach

masa@masa ~/ywesee/spreadsheet $ grep -r "def write " lib
lib/spreadsheet/excel.rb:    def write row, col, data=nil, format=nil
lib/spreadsheet/workbook.rb:    def write io_path_or_writer
lib/spreadsheet/writer.rb:    def write workbook

lib/spreadsheet/writer.rb

module Spreadsheet
  ##
  # Parent Class for all Writers. Implements the copying of unmodified
  # Spreadsheet documents.
  class Writer
    def initialize io_or_path
      @io_or_path = io_or_path
    end
    def write workbook
p "getin Spreadsheet::Writer#write"
...

lib/spreadsheet/workbook.rb

    def write io_path_or_writer
p "getin Spreadsheet::Workbook#write"
      if io_path_or_writer.is_a? Writer
        io_path_or_writer.write self
      else
        writer(io_path_or_writer).write(self)
      end
    end

1. lib/spreadsheet/writer.rb#write

    def write workbook
      if @io_or_path.respond_to? :seek
        @io_or_path.binmode
        write_workbook workbook, @io_or_path
      else
        File.open(@io_or_path, "wb+") do |fh|
          write_workbook workbook, fh           #<= here
        end
      end
    end

2. lib/spreadsheet/writer.rb#write

    def write_workbook workbook, io
      reader = workbook.io
      unless io == reader
        reader.rewind
        data = reader.read
        io.rewind
        io.write data
      end
    end

test.rb

require 'spreadsheet'

file = ARGV[0]
book = Spreadsheet.open(file, 'rb')
sheet= book.worksheet(0)
book.write "out.xls"

Run test.rb

Result

masa@masa ~/ywesee/spreadsheet $ ruby -I lib test.rb test1.xls 
"getin Spreadsheet::Workbook#write"
"getin Spreadsheet::Writer#write"

Note

  1. Spreadsheet::Workbook#write
  2. Spreadsheet::Writer#write
  • Spreadsheet::Excel::Worksheet#write is not called

Trace Spreadsheet::Workbook#write method

1.

    def write io_path_or_writer
      if io_path_or_writer.is_a? Writer
        io_path_or_writer.write self
      else
        writer(io_path_or_writer).write(self)
      end
    end

2. lib/spreadsheet/workbook.rb#writer

    def writer io_or_path, type=Excel, version=self.version
      if type == Excel
        Excel::Writer::Workbook.new io_or_path
      else
        raise NotImplementedError, "No Writer defined for #{type}"
      end
    end

3. lib/spreadsheet/excel/writer/workbook.rb Notes

  • There is no 'write' method defined in this class
  • This Workbook class inherits Spreadsheet::Writer
  • There is a 'write' method in Spreadsheet::Writer

4. lib/spreadsheet/writer.rb

    def write workbook
      if @io_or_path.respond_to? :seek
        @io_or_path.binmode
        write_workbook workbook, @io_or_path
      else
        File.open(@io_or_path, "wb+") do |fh|
          write_workbook workbook, fh
        end
      end
    end

5. lib/spreadsheet/writer.rb

    def write_workbook workbook, io
      reader = workbook.io
      unless io == reader
        reader.rewind
        data = reader.read
        io.rewind
        io.write data
      end
    end

Notes

  • It looks like a just copy process?

Question

  • Where is the updated data saved?

Hypothesis

  • Reader class reads Workbook object in the memory
  • It means that Reader class converts the Workbook object intor byte code, maybe

Trace write_colinfo method

Memo

  • I have found 'write_colinfo(s)' method!
  • This must be the converter from object to byte code

lib/spreadsheet/excel/writer/worksheet.rb#write_colinfos

  def write_colinfos
p "getin write_colinfos"
    cols = @worksheet.columns
    bunch = []
    cols.each_with_index do |column, idx|
      if column
        bunch << column
        if cols[idx.next] != column
          write_colinfo bunch
          bunch.clear
        end
      end
    end
  end

Experiment

test.rb

require 'spreadsheet'


file = ARGV[0]
book = Spreadsheet.open(file, 'rb')
sheet= book.worksheet(0)
book.write "out.xls"

Result

masa@masa ~/ywesee/spreadsheet $ ruby -I lib test.rb test2.xls 
masa@masa ~/ywesee/spreadsheet $ 

Notes

  • 'write_colinfos' method is not called

Memo

  • 'write_colinfos' method is called from 'write_from_scratch' method
  • 'write_from_scratch' method is called in lib/spreadsheet/excel/writer/workbook.rb

Experiment

lib/spreadsheet/excel/writer/workbook.rb

  ##
  # The main writer method. Calls #write_from_scratch or #write_changes
  # depending on the class and state of _workbook_.
  def write_workbook workbook, io
p "getin write_workbook"
    unless workbook.is_a?(Excel::Workbook) && workbook.io
p "A"
      @date_base = Date.new 1899, 12, 31
      write_from_scratch workbook, io
    else
p "B"
      @date_base = workbook.date_base
      if workbook.changes.empty?
p "C"
        super
      else
p "D"
        write_changes workbook, io
      end
    end
  ensure
    cleanup workbook
  end

Result

masa@masa ~/ywesee/spreadsheet $ ruby -I lib test.rb test2.xls 
"getin write_workbook"
"B"
"C"

Notes

  • Namely, 'workbook.changes.empty?' becomes true and 'super' is called
  • 'super' is the same name method of the super class, Spreadsheet::Writer#write_workbook method
  • That is why it looks like a copy process

Experiment

test.rb

require 'spreadsheet'

file = ARGV[0]
book = Spreadsheet.open(file, 'rb')
sheet= book.worksheet(0)
sheet[0,0] = 123
book.write "out.xls"

Result

masa@masa ~/ywesee/spreadsheet $ ruby -I lib test.rb test2.xls 
"getin write_workbook"
"B"
"D"

Notes

  • In this time, 'write_change' method is called, but not 'write_from_scratch' method

Experiment

lib/spreadsheet/excel/writer/workbook.rb

  def write_changes workbook, io
    sanitize_worksheets workbook.worksheets
    collect_formats workbook, :existing_document => true
    reader = workbook.ole
    sheet_data = {}
    sst_status, sst_total, sst_strings = complete_sst_update? workbook
    sst = {}
    sst_strings.each_with_index do |str, idx| sst.store str, idx end
    sheets = worksheets(workbook)
    positions = []
    newsheets = []
    sheets.each do |sheet|
      @sst[sheet] = sst
      pos, len = workbook.offsets[sheet.worksheet]
print "pos="
p pos
print "len="
p len
      if pos
p "AA"
        positions.push pos
        sheet.write_changes reader, pos + len, sst_status
      else
p "BB"
        newsheets.push sheet
        sheet.write_from_scratch
      end
      sheet_data[sheet.worksheet] = sheet.data
    end
...

Result

masa@masa ~/ywesee/spreadsheet $ ruby -I lib test.rb test2.xls 
"getin write_workbook"
"B"
"D"
pos=12144
len=894
"AA"
pos=13038
len=390
"AA"
pos=13428
len=390
"AA"

Notes

  • 'write_from_scratch' is not called, then 'write_colinfo' method is not called either

Experiment

test2.rb

require 'spreadsheet'


book = Spreadsheet::Workbook.new
sheet = book.create_worksheet
sheet.name = 'some sheet'
sheet[0,0] = 123
sheet.column(0).width = 100
book.write 'out.xls'

Result

masa@masa ~/ywesee/spreadsheet $ ruby -I lib test2.rb 
"getin write_workbook"
"A"
"getin write_colinfos"

Note

  • 'write_colinfo' is called!!

Notes

  • If a new xls file is created, not open a xls file, 'write_from_scratch' method is called
  • then all the object data in the memory is converted into byte code, maybe
  • At least, 'write_changes' does not convert Column object data into byte code

Experiment

require 'spreadsheet'

book = Spreadsheet::Workbook.new
sheet = book.create_worksheet
sheet.name = 'some sheet'
sheet[0,0] = 1
sheet[0,1] = 2
sheet[0,2] = 3
sheet[0,3] = 4
sheet[0,4] = 5

sheet.column(0).outline_level = 0
sheet.column(1).outline_level = 256
sheet.column(2).outline_level = 256
sheet.column(3).outline_level = 0
sheet.column(4).outline_level = 0

sheet.column(1).hidden = true
sheet.column(2).hidden = true

book.write 'out.xls'

Result

  • The columns are hidden but outline buttons are not shown
view · edit · sidebar · attach · print · history
Page last modified on July 13, 2011, at 12:06 PM