Unraveling Multimodality with Large Language Models.pdf
Dive into .git
1. Dive into .git
2012-05-24
Dr. NISHIO Hirokazu
http://www.nishiohirokazu.org/
2. about
• This slide was made for 10 minutes demo. in a tech
meeting of my company.
• When you print it, put 16 slides per a page will be
good enough.
• This slide was published under CC-BY 3.0 License.
You don’t need my permission to share it.
3. “Git is difficult!”
• “A lot of difficult concepts!”
• OK, put aside those concepts and
look actually what is going on with Git!
• I hope this demonstration helps you draw picture of
Git in your mind.
4. Make a repository
$ mkdir test No tricks, no gimmicks!
$ cd test Just an empty directory.
$ ls -a
. ..
$ git init
Initialized empty Git repository
in .../test/.git/ ←You got it!
“$ git init test” works same
6. Let’s look into it!
$ cd .git
$ tree
.
|-- HEAD
|-- config
|-- description
|-- hooks
|-- info
| `-- exclude
|-- objects
| |-- info
| `-- pack
`-- refs
|-- heads
`-- tags (hooks is omitted)
7. What is changed
when you committed?
$ cd ..
$ touch README
$ git add README
$ git commit -m “initial commit”
[master (root-commit) 4dd66d3]
initial commit
8. You got 3 objects!
$ tree .git/objects
.git/objects
|-- 4d
| `-- d66d3a32a66f3578317717ccfb18
|-- 54
| `-- 3b9bebdc6bd5c4b22136034a95dd
|-- e6
| `-- 9de29bb2d1d6434b8b29ae775ad8
|-- info
`-- pack
Last of filename is omitted.
Some changes outside of objects are omitted
9. Look into the objects!
Make show.py:
$ cat > show.py
#!/usr/bin/env python
import sys
import zlib
data = file(sys.argv[1], "rb").read()
data = zlib.decompress(data)
print repr(data)
Don’t forget chmod +x
10. In commit obj
$ ./show.py .git/objects/4d/d6...
'commit 201x00
tree 543b...n
author NISHIO Hirokazu <...> 1337655529 +0900n
committer NISHIO Hirokazu <...> 1337655529 +0900n
n
initial commitn'
Its filename was shown when you committed.
I broke lines for readability.
Notice on “tree 543b”
11. In tree obj
$ ./show.py 54/3b...
'tree 34x00
100644 READMEx00
xe6x9dxe2x9b...'
I broke lines.
Notice on e69b
12. In blob obj
$ ./show.py e6/9d...
'blob 0x00'
It is content of README.
It is empty now, thus size=0
and nothing are after x00
13. Filename of objects
$ python -c “import hashlib;
hashlib.sha1('blob 0x00').hexdigest()”
'e69de29b...'
It is sha1 hash of its content!
14. Conclusion
• Repository is in .git
• There are many objects in .git/objects/
• Their contents are compressed with zlib and their
filenames are sha1 hash of uncompressed contents.
• They are commit obj, tree obj and blob obj.
• Today I omitted on tags and refs (next time?)
15. Let’s try!
• Edit README and look changes!
• New commit obj has “parent <hash>” line
• New blob has new content of README
• Add new files and look changes in tree obj
• When you add lines on README, does blob have
whether diff or total content?
16.
17. Appendix
• Q: Why don’t you use
“git show --format=raw”
• A: Because it doesn’t show important information.
18. In tree obj...
$ git show --format=raw 543b
tree 543b
README
Oh, how can I know its
contents is in e69d?
That’s why I need to make show.py
19. Appendix
• Q: Why don’t you use gunzip to extract it?
• A: It is compressed with zlib, however it is now a
valid zip-file (it doesn’t have headers)
• If you know easier way, please tell me!