Proposal: Array#to_h, to simplify hash generation

Ask a Question related to Ruby, Design and Development.

  1. #1

    Default Proposal: Array#to_h, to simplify hash generation

    Hi -talk,

    Ruby has wonderful support for chewing and spitting arrays. For
    instance, it's easy to produce an array from any Enumerable using
    #map. With hashes, however, it's a bit more cumbersome.

    For example, the following method is typical of my code:

    # return { filename -> size }
    def get_local_gz_files
    files = {}
    Dir["*.gz"].each do |filename|
    files[filename] = File.stat(filename).size
    end
    files
    end

    The pattern is: create an empty hash, populate it, and return it. Now
    Ruby is a wonderfully expressive and terse language. Accordingly, the
    two lines devoted to initialising and returning the hash in the above
    code seem wasted.

    If Ruby had Array#to_h, then I could rewrite it as:

    # return { filename -> size }
    def get_local_gz_files
    Dir["*.gz"].map { |filename|
    [ filename, File.stat(filename).size ]
    }.to_h
    end


    The proposed implementation of Array#to_h is per the following code:

    class Array
    def to_h
    hash = {}
    self.each do |elt|
    raise TypeError unless elt.is_a? Array
    key, value = elt[0..1]
    hash[key] = value
    end
    hash
    end
    end


    For the final justification, note that this is the logical reverse of
    Hash#to_a:

    h = {:x => 5, :y => 10, :z => -1 }
    a = h.to_a # => [[:z, -1], [:x, 5], [:y, 10]]

    # And now, for my next trick...
    a.to_h == h # => true (gosh, that actually worked)


    Thoughts?

    Gavin


    Gavin Sinclair Guest

  2. Similar Questions and Discussions

    1. hash of hash of array slices
      This works Foreach ( @{$hash{$key1}{$key2}} ) This does note Foreach ( @{($hash{$key1}{$key2})} ) This gives me this error .... Can't...
    2. [PHP-DEV] EOT (was [PHP-DEV] Proposal: Array syntax)
      On Thu, 6 Nov 2003, Andi Gutmans wrote: If there was anything constructive in that long thread of "I like it" -- "no, I don't!" I might agree...
    3. [PHP-DEV] EOT (was [PHP-DEV] Proposal: Array syntax)
      Sascha, I don't think it's a private matter. Feel free to delete the emails with this subject when they come in. Andi At 01:30 PM 11/6/2003...
    4. Re[2]: [PHP-DEV] Proposal: Array syntax
      Hi The problem i see when using array() (or list()) is that it nearly looks like it is a function, but it isn't. Using instead would clearly...
    5. hash generation question
      Stephan wrote: How's this related to the modules list? Anyhow, what about this: my $hashref = \%hash; $hashref = $hashref->{$_} foreach...
  3. #2

    Default Re: Proposal: Array#to_h, to simplify hash generation

    >>>>> "G" == Gavin Sinclair <gsinclair@soyabean.com.au> writes:

    G> def get_local_gz_files
    G> files = {}
    G> Dir["*.gz"].each do |filename|
    G> files[filename] = File.stat(filename).size
    G> end
    G> files
    G> end

    svg% cat b.rb
    #!/usr/bin/ruby
    def get_local_c_files
    Hash[*Dir["*.c"].map do |filename|
    [filename, File.stat(filename).size]
    end.flatten]
    end
    p get_local_c_files
    svg%

    svg% b.rb
    {"st.c"=>10714, "range.c"=>10706, "enum.c"=>11250, "util.c"=>22676,
    "sprintf.c"=>12332, "re.c"=>38877, "version.c"=>1094, "random.c"=>6485,
    "object.c"=>34530, "class.c"=>17870, "main.c"=>988, "compar.c"=>2720,
    "array.c"=>43170, "process.c"=>30792, "io.c"=>82748, "dln.c"=>39614,
    "variable.c"=>35056, "time.c"=>32796, "string.c"=>69845, "regex.c"=>123352,
    "numeric.c"=>36979, "inits.c"=>1765, "dmyext.c"=>20, "dir.c"=>21761,
    "signal.c"=>13318, "pack.c"=>39965, "math.c"=>6199, "hash.c"=>39087,
    "error.c"=>25114, "parse.c"=>348857, "ruby.c"=>22725, "marshal.c"=>27620,
    "lex.c"=>4480, "bignum.c"=>34051, "struct.c"=>15141, "prec.c"=>1677,
    "gc.c"=>34935, "file.c"=>58392, "eval.c"=>219839}
    svg%


    Guy Decoux


    ts Guest

  4. #3

    Default Re: Proposal: Array#to_h, to simplify hash generation

    On Sat, Jul 19, 2003 at 11:22:20PM +0900, Gavin Sinclair wrote:
    > If Ruby had Array#to_h, then I could rewrite it as:
    It does, almost:

    irb(main):001:0> a = ["cat","one","dog","two"]
    => ["cat", "one", "dog", "two"]
    irb(main):002:0> Hash[*a]
    => {"cat"=>"one", "dog"=>"two"}

    I don't remember seeing an exact inverse of Hash#to_a though, i.e. one which
    converts [[a,b],[c,d]] to {a=>b, c=>d}

    You can always 'flatten' your array, as long as the elements of the hash
    you're creating aren't themselves arrays.

    Regards,

    Brian.

    Brian Candler Guest

  5. #4

    Default Re: Proposal: Array#to_h, to simplify hash generation

    Hi,

    In message "Proposal: Array#to_h, to simplify hash generation"
    on 03/07/19, Gavin Sinclair <gsinclair@soyabean.com.au> writes:

    |If Ruby had Array#to_h, then I could rewrite it as:
    |
    | # return { filename -> size }
    | def get_local_gz_files
    | Dir["*.gz"].map { |filename|
    | [ filename, File.stat(filename).size ]
    | }.to_h
    | end

    It has been proposed several times. The issues are

    * whether the name "to_h" is a good name or not. somebody came up
    with the name "hashify". I'm not excited by both names.

    * what if the original array is not an assoc array (array of arrays
    of two elements). raise error? ignore?

    matz.

    Yukihiro Matsumoto Guest

  6. #5

    Default Re: Proposal: Array#to_h, to simplify hash generation

    On Sunday, July 20, 2003, 1:31:42 AM, Yukihiro wrote:
    > Hi,
    > In message "Proposal: Array#to_h, to simplify hash generation"
    > on 03/07/19, Gavin Sinclair <gsinclair@soyabean.com.au> writes:
    > |If Ruby had Array#to_h, then I could rewrite it as:
    > |
    > | # return { filename -> size }
    > | def get_local_gz_files
    > | Dir["*.gz"].map { |filename|
    > | [ filename, File.stat(filename).size ]
    > | }.to_h
    > | end
    > It has been proposed several times.
    I thought it sounded familiar, but didn't see an RCR.
    > The issues are
    > * whether the name "to_h" is a good name or not. somebody came up
    > with the name "hashify". I'm not excited by both names.
    #to_h sounds good to me - we already have to_s, to_a, to_i, etc. It's
    just too sweet that Hash#to_a and Array#to_h should be the inverse of
    each other.

    What don't you like about #to_h?

    #to_hash is fine by me too, but I don't really know the nuances of
    to_s/to_str, to_a/to_ary, ...
    > * what if the original array is not an assoc array (array of arrays
    > of two elements). raise error? ignore?
    Raise error. #to_h is clearly a method to be used with care. People
    are unlikely to call it on random objects. Of course, [1,2,3,4].to_h
    could be the equivalent to Hash[1,2,3,4]. But then there's the corner
    case: [ [1,2], "x", [7,8], "g" ].to_h.

    I think I would insist on the input being an assoc array.

    Gavin


    Gavin Sinclair Guest

  7. #6

    Default Re: Proposal: Array#to_h, to simplify hash generation

    Hi,

    In message "Re: Proposal: Array#to_h, to simplify hash generation"
    on 03/07/20, Gavin Sinclair <gsinclair@soyabean.com.au> writes:

    |I thought it sounded familiar, but didn't see an RCR.

    I don't remember the RCR number. Search for "hashify".

    |What don't you like about #to_h?

    I just didn't feel we had consensus. Besides, "to_h" you've proposed
    work for arrays with specific structure (assoc like).

    |#to_hash is fine by me too, but I don't really know the nuances of
    |to_s/to_str, to_a/to_ary, ...

    Longer versions are for implicit conversion. An object that has
    "to_str" works like a string if it's given as an argument.

    Note we have "to_hash" already. But this would not be the reason for
    "to_h". We have "to_io" without the shorter version, for example.

    |> * what if the original array is not an assoc array (array of arrays
    |> of two elements). raise error? ignore?
    |
    |Raise error. #to_h is clearly a method to be used with care. People
    |are unlikely to call it on random objects. Of course, [1,2,3,4].to_h
    |could be the equivalent to Hash[1,2,3,4]. But then there's the corner
    |case: [ [1,2], "x", [7,8], "g" ].to_h.
    |
    |I think I would insist on the input being an assoc array.

    TypeError? or ArgumentError?

    I just remembered that I thought Hash[ary] might be the better
    solution. I'm not sure why I didn't implement it. I have very loose
    memory.

    matz.

    Yukihiro Matsumoto Guest

  8. #7

    Default Re: Proposal: Array#to_h, to simplify hash generation

    On Sunday, July 20, 2003, 2:56:08 AM, Yukihiro wrote:
    > Hi,
    > In message "Re: Proposal: Array#to_h, to simplify hash generation"
    > on 03/07/20, Gavin Sinclair <gsinclair@soyabean.com.au> writes:
    > |I thought it sounded familiar, but didn't see an RCR.
    > I don't remember the RCR number. Search for "hashify".
    It's #12. Interesting: I like the #hashify idea better than my proposal.

    My original code could be written

    # return { filename -> size }
    def get_local_gz_files
    Dir["*.gz"].to_hash { |filename| File.stat(filename).size }
    end

    That does away with the intermediate assoc array, and is overall very
    elegant. Best of all, it can be used with any Enumerable type, and it
    doesn't have any requirement on the structure of the receiver.

    module Enumerable
    def to_hash
    result = {}
    each do |elt|
    result[elt] = yield(elt)
    end
    result
    end
    end


    That is capturing the very idiom I have repeated so many times.

    Alternatives to #to_hash are:
    hashify (the original and the worst :)
    map_hash
    hash_map (it is, after all, mapping a collection into a hash)

    I think I like "map_hash" the best.

    ["cat", "dog", "mouse"].map { |s| s.length }
    # -> [3, 3, 5]

    ["cat", "dog", "mouse"].map_hash { |s| s.length }
    # -> {"cat"=>3, "mouse"=>5, "dog"=>3}

    Gavin


    Gavin Sinclair Guest

  9. #8

    Default Re: Proposal: Array#to_h, to simplify hash generation

    > I just didn't feel we had consensus. Besides, "to_h" you've proposed
    > work for arrays with specific structure (assoc like).
    Far be it from me to say anything of much value, but I definitely think
    that an instance function of Class Array should have a defined behavior
    for all Arrays. Is there any argument to the contrary?

    -Kurt

    Kurt M. Dresner Guest

  10. #9

    Default Re: Array and Hash to_s

    > The main problem here is that Array#to_s calls join with the default
    > field separator, which for some reason is "". To me, this isn't
    > intuitive. Is there some historical reason why this behavior exists?
    > Even less intuitive to me is Hash#to_s, because the way the conversion
    > is done you lose any concept it was a hash.
    It's intuitive because it's the opposite of taking a string and putting
    each character as an element of an array.

    "foobar" -> ['f','o','o','b','a','r'] -> "foobar"

    If you want a different .to_s you can just join with something else.
    It's pretty easy to just do foobararray.join(',') if you want
    "f,o,o,b,a,r", and additionally it's a little easier to read.

    -Kurt

    Kurt M. Dresner Guest

  11. #10

    Default Re: Proposal: Array#to_h, to simplify hash generation

    Gavin Sinclair <gsinclair@soyabean.com.au> wrote:
    > Raise error. #to_h is clearly a method to be used with care. People
    > are unlikely to call it on random objects. Of course, [1,2,3,4].to_h
    > could be the equivalent to Hash[1,2,3,4]. But then there's the corner
    > case: [ [1,2], "x", [7,8], "g" ].to_h.
    >
    > I think I would insist on the input being an assoc array.
    And we already have Array methods that assume an associative array.

    m.
    Martin DeMello Guest

  12. #11

    Default Re: Proposal: Array#to_h, to simplify hash generation

    On Sun, 20 Jul 2003 03:20:50 +0900, Kurt M. Dresner wrote:
    >> I just didn't feel we had consensus. Besides, "to_h" you've proposed
    >> work for arrays with specific structure (assoc like).
    >
    > Far be it from me to say anything of much value, but I definitely think
    > that an instance function of Class Array should have a defined behavior
    > for all Arrays. Is there any argument to the contrary?
    >
    > -Kurt
    pack, assoc, and rassoc
    Tim Hunter Guest

  13. #12

    Default Re: Proposal: Array#to_h, to simplify hash generation

    Hi,

    In message "Re: Proposal: Array#to_h, to simplify hash generation"
    on 03/07/20, Martin DeMello <martindemello@yahoo.com> writes:

    |> I think I would insist on the input being an assoc array.
    |
    |And we already have Array methods that assume an associative array.

    I think you mean assoc and rassoc. But they are look-up methods. No
    harm would happen for non assoc input for them. I feel like Hash
    creation is little bit different.

    matz.

    Yukihiro Matsumoto Guest

  14. #13

    Default Re: Proposal: Array#to_h, to simplify hash generation

    Yukihiro Matsumoto <matz@ruby-lang.org> wrote:
    > In message "Re: Proposal: Array#to_h, to simplify hash generation"
    > on 03/07/20, Martin DeMello <martindemello@yahoo.com> writes:
    > |
    > |And we already have Array methods that assume an associative array.
    >
    > I think you mean assoc and rassoc. But they are look-up methods. No
    > harm would happen for non assoc input for them. I feel like Hash
    > creation is little bit different.
    Actually, I've always felt those were out of place in Array too. And if
    they were factored out into an AssocArray mixin, we could conveniently
    put hashify there.

    martin
    Martin DeMello Guest

  15. #14

    Default Re: Proposal: Array#to_h, to simplify hash generation

    Hi --

    On Mon, 21 Jul 2003, Martin DeMello wrote:
    > Yukihiro Matsumoto <matz@ruby-lang.org> wrote:
    > > In message "Re: Proposal: Array#to_h, to simplify hash generation"
    > > on 03/07/20, Martin DeMello <martindemello@yahoo.com> writes:
    > > |
    > > |And we already have Array methods that assume an associative array.
    > >
    > > I think you mean assoc and rassoc. But they are look-up methods. No
    > > harm would happen for non assoc input for them. I feel like Hash
    > > creation is little bit different.
    >
    > Actually, I've always felt those were out of place in Array too. And if
    > they were factored out into an AssocArray mixin, we could conveniently
    > put hashify there.
    But the special case of converting an associative array to a hash is
    different from the "classic" (in terms of volume of ruby-talk devoted
    to it, and how long we've been discussing it :-) array-to-hash
    conversion, as per RCR #12 and its definition of "hashify" (a term I
    proposed reluctantly, knowing people would hate it :-) but it seemed
    the most accurate for what I was describing). Modularization is a
    good idea, though, particularly for the various home-grown
    [{to_(h}ash]ify) variants in circulation, though organizing that kind
    of thing community-wide is something I've never figured out how to do.


    David

    --
    David Alan Black
    home: [email]dblack@superlink.net[/email]
    work: [email]blackdav@shu.edu[/email]
    Web: [url]http://pirate.shu.edu/~blackdav[/url]


    dblack@superlink.net Guest

  16. #15

    Default Re: Proposal: Array#to_h, to simplify hash generation

    Hi --

    On Mon, 21 Jul 2003, Gavin Sinclair wrote:
    > I like "make_hash". We would get more flexibility if "make_hash" insisted
    > on receiving two values for the block: one for key and one for value. In
    > one instance recently, I wanted to map the "filename" part of a data
    > object to the object itself. This, I think, is readable:
    > map = receipts.make_hash { |r| r.filename, r }
    >
    > Whereas in my pet case of mapping filename to size, we have
    >
    > map = filenames.make_hash { |fn| fn, File.stat(fn).size }
    >
    > And your example comes out as
    >
    > (1..10).make_hash { |i| i, f(i) }
    (Wouldn't you have to wrap your two return values in an array to get
    the above to parse?)
    > I've raised an RCR for this (#148).
    I'm not sure how this differs from (rejected) RCR#12 (except for
    having to return a key as well as a value).


    David

    --
    David Alan Black
    home: [email]dblack@superlink.net[/email]
    work: [email]blackdav@shu.edu[/email]
    Web: [url]http://pirate.shu.edu/~blackdav[/url]


    dblack@superlink.net Guest

  17. #16

    Default Re: Proposal: Array#to_h, to simplify hash generation

    Gavin Sinclair <gsinclair@soyabean.com.au> wrote:
    >
    > I like "make_hash". We would get more flexibility if "make_hash" insisted
    > on receiving two values for the block: one for key and one for value. In
    > one instance recently, I wanted to map the "filename" part of a data
    > object to the object itself. This, I think, is readable:
    >
    > map = receipts.make_hash { |r| r.filename, r }
    It'd be nice if => were merely an alias for , so that we could say

    .make_hash {|r| r.filename => r}

    with two arguments, there's always the risk of confusing them.

    Perhaps .make_hash {|r| {r.filename => r}}, where it updates the hash
    with the anon hash. Inefficient, though, and it has the ugly {{ }}.
    > (I don't know why you put the asterix there; Ranges are Enumerable.)
    Yeah, I forget that from time to time.

    martin
    Martin DeMello Guest

  18. #17

    Default Re: Proposal: Array#to_h, to simplify hash generation

    On Monday, July 21, 2003, 8:18:50 PM, dblack wrote:
    > Hi --
    > On Mon, 21 Jul 2003, Gavin Sinclair wrote:
    >> I like "make_hash". We would get more flexibility if "make_hash" insisted
    >> on receiving two values for the block: one for key and one for value. In
    >> one instance recently, I wanted to map the "filename" part of a data
    >> object to the object itself. This, I think, is readable:
    >> map = receipts.make_hash { |r| r.filename, r }
    >>
    >> Whereas in my pet case of mapping filename to size, we have
    >>
    >> map = filenames.make_hash { |fn| fn, File.stat(fn).size }
    >>
    >> And your example comes out as
    >>
    >> (1..10).make_hash { |i| i, f(i) }
    > (Wouldn't you have to wrap your two return values in an array to get
    > the above to parse?)
    I was kinda hoping not, but so be it. The thin veneer of presenting
    tested code vanishes before everyone's eyes. I was surprised to
    discover today that code (1) below works, but not code (2).

    (1) def foo; 2,4; end
    (2) def foo; return 2,4; end
    >> I've raised an RCR for this (#148).
    > I'm not sure how this differs from (rejected) RCR#12 (except for
    > having to return a key as well as a value).
    How did O.J. Simpson's second trial differ from his first? ;)

    Anyway, I think returning a key as well as a value is a significant
    difference:
    - much more flexible (I create all kinds of hashes all the time in
    my code, and could really use that flexibility)
    - less magical, more scrutible: having two values makes it clear what
    is going on, given that we're dealing with a hash. With the
    single-value to_hash/hashify, I had to keep reminding myself what
    it meant; not so with the new "make_hash".

    Gavin


    Gavin Sinclair Guest

  19. #18

    Default Re: Proposal: Array#to_h, to simplify hash generation

    On Mon, 21 Jul 2003 05:32:25 GMT
    Martin DeMello <martindemello@yahoo.com> wrote:
    > To me, 'hashify' implies taking an assoc array and converting it to hash
    > form (or perhaps the perl-influenced [a, b, c, d] -> {a=>b, c=>d}). I
    > still can't think of a name for the useful case :) make_hash, perhaps ..
    >
    > *(1..10).make_hash {|i| f(i)}
    >
    > or maybe the complementary hash_to and hash_from, where the block is
    > respectively the value and the key for the corresponding array entry :)
    I'd like the code to be something like this:

    module Enumerable
    def to_h
    h = Hash.new
    if block_given?
    self.each { |e| h[e] = yield(e) }
    else
    self.each { |key, value| h[key] = value }
    end
    return h
    end
    end
    >> (1..5).to_h { |n| n*n }
    => {5=>25, 1=>1, 2=>4, 3=>9, 4=>16}
    >> [ [1,2], [3,4] ].to_h
    => {1=>2, 3=>4}
    >> [ [1,2], [3,4] ].to_h.to_a
    => [[1, 2], [3, 4]]
    >> [ [1,2], [3,4] ].to_h.to_a.to_h
    => {1=>2, 3=>4}
    >> Dir["/bin/d*"].to_h { |f| File.size(f) }
    => {"/bin/dnsdomainname"=>9332, "/bin/date"=>25728, "/bin/dd"=>29492, "/bin/dmesg"=>3924, "/bin/df"=>27368, "/bin/domainname"=>9332}

    I would almost prefer [1,2,3,4].to_h => {1=>2, 3=>4}, but Hash#to_h returns a
    nested array, so that's what this code does. Plus it's easier to implement. :-)

    Jason Creighton
    Jason Creighton Guest

  20. #19

    Default Re: Proposal: Array#to_h, to simplify hash generation

    >>>>> "G" == Gavin Sinclair <gsinclair@soyabean.com.au> writes:

    G> tested code vanishes before everyone's eyes. I was surprised to
    G> discover today that code (1) below works, but not code (2).

    G> (1) def foo; 2,4; end
    G> (2) def foo; return 2,4; end

    Well, you want to say say (2) work but not (1), no ?


    Guy Decoux


    ts Guest

  21. #20

    Default Re: Proposal: Array#to_h, to simplify hash generation

    On Mon, Jul 21, 2003 at 11:34:39PM +0900, Gavin Sinclair wrote:
    > > I'm not sure how this differs from (rejected) RCR#12 (except for
    > > having to return a key as well as a value).
    >
    > How did O.J. Simpson's second trial differ from his first? ;)
    >
    > Anyway, I think returning a key as well as a value is a significant
    > difference:
    > - much more flexible (I create all kinds of hashes all the time in
    > my code, and could really use that flexibility)
    > - less magical, more scrutible: having two values makes it clear what
    > is going on, given that we're dealing with a hash. With the
    > single-value to_hash/hashify, I had to keep reminding myself what
    > it meant; not so with the new "make_hash".
    Can I suggest another name - "collect_hash" - since that's basically what it
    is?

    collect {... return [x,y] } =>> [[a,b], [c,d], ...]
    collect_hash {... return [x,y] } =>> {a=>b, c=>d, ...}

    In which case it clearly belongs in Enumerable - see Gavin's implementation
    in [RubyTalk:76446]

    It's still not an inverse operation to Hash#to_a, and I think there could
    still be value in that. You could simulate it of course, using

    myhash = myarray.collect_hash { |pair| pair }

    If we didn't have Hash#to_a then it could also be implemented as

    myarray = myhash.collect { |pair| pair }

    But we do, so we don't bother.

    Regards,

    Brian.

    Brian Candler Guest

Posting Permissions

  • You may not post new threads
  • You may post replies
  • You may not post attachments
  • You may not edit your posts

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139