李杰的博客备份: 四月 2007

2007年4月30日星期一

还是好心人多

下午给队长的Moto a780装了几个主题，下班后没一会给我打过电话来说死机了，怎么也开不了，其实下载主题的时候我就知道a780存在这样的问题，但还是心存侥幸，谁知道还是让我碰上了，吃了晚饭后拿上了手机，然后就开始在imobile逛，看到了一篇帖子，按照一步一步来，到了第四步怎么也进行不下去了，没有办法，幸亏作者留了qq，联系上了。接下来的一个小时里面我就坐在屏幕前，看着人家在我的机器上远程操作，做了很多次尝试，不得不佩服这位qq叫子夜的人，太有才了，太热心了，害得子夜一晚上什么都没做，只给我修手机了，庆幸的是机器后来正常启动了。为了表示感谢给了子夜一个正版的瑞星序列号。千言万语都说不尽呐。

感谢子夜。

2007年4月28日星期六

YouTube视频加速器

如果你一直无法忍受YouTube蜗牛般的速度的话，那么这个软件绝对适合你，我已经使用了一个月了，效果很好。

基本原理大概就是这个软件会以多线程的方式下载视频，从而大大地提高视频下载的速度，使得视频浏览基本上不会再出现一咔一咔的现象。

软件安装后会随着系统一起启动，当你观看YouTube视频的时候软件会自动弹出一个小窗口，表示正在加速视频的下载。一个不错的功能就是，软件还会保存你最近观看过的YouTube视频。

测试一下吧

AAALogo–Web2.0风格logo制作工具

工具非常强大，使用非常简单，做出来的图片很漂亮

如果用一句话形容这个软件的话。

软件大小将近20M，内含有素材2千多张，都是很漂亮的小图案，可以变换颜色，按照自己的样子进行修改。并且带有很多好看的字体，这些字体不会安装到windows下，所以不用担心会托慢你的机器。

从我的box下载吧 part1 part2 part3

方正静蕾字体

中国将会翻拍《越狱》

北京中博世纪影视传媒有限公司以120万美元的价格买下了Fox的热门电视剧《越狱》的重新制作的权利，中博将会将《越狱》翻拍为一部电影。而现在演员的招募工作正在网络上进行着。影片将会在5月中旬开拍，6月底结束。

一个半月时间，咱们国人还是别糟蹋人家老外的东西了。

来源

2007年4月27日星期五

首页文件似乎不更新了

昨天写了两篇日志，发现都没有出现到首页上。很奇怪，并且geekv.blogspot.com也不能访问了，确切的说是一访问就自动转到了这里。而这里的日志却又更新不了。几天前在feedburner烧制的feed也停止的更新，乱七八糟的问题都赶到了一起。

忙碌的工作从过完年一直持续到现在，还没有睡过一个好觉，51到了，按照习惯还是要窝在家里，真正期望的是51过后的事情，4月破了一个不大不小的案子，估计会记上一功，如果可以的话也算对得起这么长时间的劳累。

写了一堆进入公安半年后的感想，但是文笔实在不行，并且也不想在博客里面提及工作的事情，所以还是删除了，再说半年也体会不出个什么来。

只希望自己在工作上能够继续努力，把当程序猿时养成的坏毛病改掉，思路更宽一些。

2007年4月26日星期四

在线大富翁游戏–Get the glass

非常好玩的一款在线大富翁游戏，采用flash动画形式，动画做的非常非常精美。可以鼠标控制骰子，注意，有警车和直升机在后面追你。

当然逃亡过程很艰难，要碰到很多难题。说说我碰到的。

1. 填字，当然有提示，不过对于英语不是母语的人来说这个有些难，并且有时间限制。

2.根据图猜字，照样没有答对

3.打扮，在规定时间内将一位女士打扮成已有的样子。

4.回答问题，都是很简单的问题，但是一瞬间就会消失，对于母语不是英语的人来说，比较难。

如果你不幸，被抓进入到了监狱，有三种选择。

1. 服刑。

2.抛骰子，我一直用这个方法逃脱。

3.给你的朋友发邮件，让他来解救你。

目前就玩了这些，还没有玩通关。极力向大家推荐。

Polishlinux开始意大利语翻译

以前给大家介绍的Polishlinux现在开始了意大利语版本的翻译活动，懂得英语和意大利语的都可以报名.

询问了下管理员什么时候打算翻译为中文的，得到回复

Well, It all depends on the community involvement. I will strongly support any translation project and I’m willing to provide the intrastructure (the domain, servers, wiki, CMS). But obviously for such big projects (and they are big, I know it myself since I did the translation from Polish to English mostly on my own) need good coordination and a dedicated team with a leader. If you can organize such team, I’ll surely support it.

只可惜polishlinux内容太多，个人不可能全部翻译完成，如果你看到了这篇帖子，并且打算翻译的话，可以联系下我（右上角有我的mail），共同完成工作。我会将更本土化的linuxren.org域名提供出来。

2007年4月24日星期二

救助小王越爱心签名活动

王越，一个7岁的小女孩，农民工家的孩子，18个月前被确诊为急性髓系白血病(M2a)。
一年多以来，我们付出了很多努力去救助小王越，但我们的力量太有限了，所以我们发起了爱心签名行动。希望能够借助更多的MSN签名让更多的人知道这件事儿，大家一起来帮助这个可怜的孩子。

在话题广告内测活动中，救助小王越将成为我们的第一个话题。我们希望以这种形式，呼唤中国Blogosphere的爱心，让小王越能够在面对病魔的时候更有信心，更有力量。

每完成一次MSN爱心签名，你就为小王越多筹集了5分钱

一、只要你在MSN签名中插入“(L)救助小王越--www.helpwy.com”，就实现了爱心签名。
二、添加“爱心签名助手”到你的msn好友：qianming033@hotmail.com 如果你添加“爱心签名助手”到你的MSN好友列表，当你的爱心签名在线时，她每天就会记录一次。

三、可以通过这里查看你的爱心签名记录

活动官方首页为： http://helpwy.com
Feedsky的专题页面：http://beta.feedsky.com/wangyue

得到博邻T恤一件

几天前看到消息说是博邻可以免费得到T恤一件，并且上面的内容可以自己制定，侥幸发了一封邮件，几天前得到了回复，刚刚又收到邮件询问T恤上要印的博客名字、博客链接以及邮购地址等信息，估计可以免费得到一件T恤了。

再加上前几天从微软得来的帽子，免费的上半身算是全了，什么时候把下半身也弄全了，也算弄个全身一文不值。 :-)

2007年4月23日星期一

新注册了一个cn域名

2块钱，高于行价1快，OMFG.cn

omfg可以代表很多意思：

oh my f** god

Olives, Mayo, Feta, Garlic

还有很多，更多的意思查看google吧

2007年4月22日星期日

Eric Schmidt at the Web 2.0 Expo

Google的类power point服务即将推出

手动转移文章

对于这种体力活向来是一点办法也没有，blogger一个帐户可以创建多个博客，但是问题就在于如果你想合并多个博客到同一个博客里面的话，那就只能是手动转移了，这就是我正在做的事情，打算把自己另外两个有100多篇帖子的博客手动转到这个里面，最早的帖子追溯到了04年。如果可能的话还会给每一个帖子打上tag, 做个分类。

以前的帖子大多是转贴，有很多是以前linuxren论坛上的备份，很珍贵的资料，不想丢弃。还有就是自己以前当程序猿时候的日志，也觉得应该保留下来，算是一份回忆吧。

如果无法访问

就访问geekv.blogspot.com，算是blogger的一个bug，居然blogspot和另外我自己找的免费ftp空间一起同步更新，一摸一样，就连必须添加的blogspot tool bar也没有，很奇怪。

平时太累，博客处于疲劳期，很想更新，却不知道该写什么。

2007年4月18日星期三

两个设计自己T恤的网站

当然，设计出来的T恤可以订购然后穿在自己身上。

第八乐园除了T恤还有其他的东西可以自己设计，台历，胸章，抱枕等等。注册后可以选择自己喜欢的物件来进行个性化定制。可以上传自己的图片或者直接选用DigPark提供的图片。就T恤而言，一件价格是48元，再加上运费等等，不到60元意见，算是非常值的。

MyTshirt目前只有T恤可供大家设计，因为这样也更加专业一些，一些很实用的工具在这里都可以找到。价格在45元，并且会根据你的个头来帮你选择T恤的号码，很方便。

两个网站都不错，但是目前还没有真正购买过，所以做不出什么比较，过几天会设计个个性T恤，每个网站都订购一件，看看效果如何。

什么时候可以diy宠物的衣服？

Blogger很奇怪的问题

博客居然可以同时发布到外部的ftp和blogspot上，也就是两个地址都是同时更新的。可以看看在blogspot上的地址，两者是一摸一样的，模板，帖子，很奇怪。
3073f3b4

2007年4月17日星期二

钻到你的图片里面去

可以将2D图片转化为3D的一个这么个玩意
看上去好像就是把下半部分的图片稍微弯曲一个角度就有了这个效果。

2007年4月16日星期一

让我没脾气的空间

换成yo2的吧，能绑定域名的时候就正式搬家，省的头疼。

帮geek-zhang在yo2也申请了一个博客，把自己没有用的米给她使用。似乎自己周围的人都很少很少写博客。

j2me的学习在缓慢进行当中，今天刚刚把整个长治市的地图从google earth上搬了下来，一共5万张图片，无缝连接到了一起，下面就是绘制坐标。

linuxren.org又到续费时间，内核学习快要开始了。

2007年4月12日星期四

越狱二回顾

Ubuntu 7.04光盘可以免费索取

尽管正式版本还没有发布，但是现在已经可以免费预定了，可以预定下面的几种套餐：

1 Ubuntu CD (1 64-bit PC Edition)
1 Ubuntu CD (1 PC Edition)

3 Ubuntu CDs (3 PC Edition)

3 Ubuntu CDs (2 PC Edition, 1 64-bit PC Edition)

刚刚学则了3套32位的。

2007年4月11日星期三

为什么Linux不是实时的操作系统？

如果你订阅了linuxnewbies mail list的话会发现很多人在讨论这个问题，值得研究一下

恐怖的myqq

也太假了~

又是电脑绘图

这样的视频似乎越来越多，如果你看过《lost》的话，就应该知道画的是谁了。

不知道那些真正用笔画画的人看了这些视频之后是否也有将笔扔掉，拿起鼠标的冲动。连达芬奇老爷爷的蒙娜丽莎都复制的那么惟妙惟肖，真的应该考虑换换手里的工具了。

2007年4月10日星期二

Google news alerts 也考古？

自从04年6月有了gmail就订阅了google alerts, 有空没空都看看，今天突然发现google居然把以前的东西挖了出来。刚刚收到上图的邮件，点进去一看，日期居然是03年的。

倒影图片

用了1个小时完工，照猫画虎（笨呐，以前没用过photoshop），还做了一个黑客帝国的效果，但是总觉得不像，不敢丢人，不放上来了，点击可以放大，打算再设计几个不错的logo，然后印到t恤上。

2007年4月8日星期日

使用画图板画蒙娜丽莎

以前发过一个使用画图板画跑车的，这里又有一个画蒙娜丽莎的，乖乖～～

博客经常无法访问

现在的免费空间速度很快，几百篇blogger文章不用间断就会全部一次发布完毕，但是惟一一个头疼的问题是输入lijie.org的时候经常出现上图的错误，没脾气，其实还是可以访问的，加上首页文件就可以了，lijie.org/index.html。是在懒得动就订阅feed吧。

2007年4月7日星期六

iPod第一款病毒产生

名字叫做Podloso，本身并没有什么危害，他的存在仅仅是向大家说明，任何平台下都会有病毒，该病毒会感染安装有Linux的iPod，当病毒激活的时候，会自动扫描硬盘，并且感染所有可执行文件（.elf格式）。但是该病毒不会影响机器的运行，仅仅是象征性的提醒大家，病毒无处不在。

再见C-Note

C-Note估计是在《越狱》逃犯中最幸运的一个，没有死，并且最终和家人团聚，也就意味着他在第三季出现的概率很小了。这里有一篇对C-Note扮演着Rockmond Dunbar的采访，谈到了有关越狱的一些事情，还有他即将出演的新片《Heartland》的一些事情。

文章中谈到C-Note在第三季中也有可能回来，但是可能性很小，在开拍《越狱》第二季18集前，制片给C-Note打电话说：“很高兴告诉你，C-Note没有死，但是我们不知道该怎么安排你，你可以在第三季中继续出现，但是报酬要低一些，或者，你可以离开，然后找点自己喜欢的事情去做。我们很不情愿开到你离去，但是我们也不希望你做到那里，然后每天收支票，你也不希望这样”。Dunbar当时很难做出决定，因为在他眼里,C-Note是绝对应该回来的，但是没有办法，所以他选择的离开，并且有了新的剧本，《HeartLand》，他在其中担任一个外科医生。

2007年4月6日星期五

越狱第三季(prison break III)细节总汇

越狱第二季刚刚结束，大家都在迫不及待的等待着第三季，第三季的播出时间已经大概确定在今年的8月份，而关于第三季的一些消息也是大家很想知道的，之前也发过两个帖子（1，2）透露了一些消息，在这里做一个总汇。

并不是所有第二季的演员都会出现在第三季中，比如C-Note。第三季的拍摄会在达拉斯，路易斯安那，佛罗里达进行，当然也会在巴拿马。并且第三季会更加的暴力，血腥一点。

会有四个新面孔跟大家见面，并且都不是美国人。

第三季故事展开方式跟第二季的会有所不同，会更加的紧张刺激。

Michael和Linc其中一个会在第三季中丢掉性命，但是还不确定是哪个。（我估计是Linc吧）

一个和Michael一样聪明的女性会出场，现在打算是站在Michael这边，并且会和兄弟两个之中的一个有暧昧关系。

Sara会和Michael有机会在一起彼此照顾对方。

2007年4月5日星期四

诺基亚（Nokia）等公司加入Linux Foundation【翻译】

Nokia，Marvell，VirtualLogix加入了非盈利性组织Linux Foundation(LF，2007年1月OSDL和FSG合并后的组织，其目的是为社区及Linux产业提供有用的服务，并且保护，促进Linux标准的统一化)。这使得LF的成员数达到了86个。

Nokia的手机平台Maemo就是基于Linux的，并且Nokia手机浏览器也是开源的。

“能够意识到Linux可以在不同的环境中使用是很重要的，但是也有很多问题。跟Marvell, Nokia, VirtualLogix的合作会有助于帮助我们加深这种印象，并且将会推动Linux的发展”

Marvell会专注于手机以及嵌入式Linux平台的标准化上，并且将这一标准应用到更广泛的设备当中。

而VirtualLogix会凭借其在实时虚拟平台上的经验，帮助硬件厂商们将Linux安装到其设备当中。“通过虚拟技术，厂家可以节省开支，并且在同一种硬件平台上管理多种操作系统，提高硬件的性能。”

Nokia会继续在其基于Linux的技术上努力，包括Internet Tablet（一种无线上网的手机，比如Nokia770）以及vendor-neutral environment。

原文

手机预装Linux前景乐观【翻译】

“今年，也就是2007年，会有8百一十万部手机安装Linux，而这个数字到了2012年会增加到1亿2千7百万。Linux席卷手机操作系统仅仅是时间问题。” ABI最新的报告指出。

“但是延迟行却阻止了Linux在单处理器手机上的进一步发展，因为Linux还不是一款实时操作系统。但是我们能找到解决问题的办法，比如在Linux内核上建立一个实时的系统等等”。

“商家还必须统一开放的API，以便于第三方的厂商开发自己的应用程序”。

“毕竟Linux的成本是非常低的，甚至是免费的”。

Damn small Linux

Damn small Linux简称为DSL，体积只有50M大小，一开始仅仅是处于个人的兴趣，看看在50M大小的空间里面可以放进去多少有用的Linux程序。但是渐渐地DSL受到了很多人的关注，数百人参与了对DSL的完善，现在已经具备了一个相对完善的桌面环境，你可以安装自己喜欢的软件，并且还有很完善的备份／恢复系统，并且还可以作为SSH/FTP/HTTPD服务器来使用。下面是DSL功能的清单

XMMS (MP3, CD Music, and MPEG), FTP client, Dillo web browser, Netrik web browser, FireFox, spreadsheet, Sylpheed email, spellcheck (US English), a word-processor (Ted), three editors (Beaver, Vim, and Nano [Pico clone]), graphics editing and viewing (Xpaint, and xzgv), Xpdf (PDF Viewer), emelFM (file manager), Naim (AIM, ICQ, IRC), VNCviwer, Rdesktop, SSH/SCP server and client, DHCP client, PPP, PPPoE (ADSL), a web server, calculator, generic and GhostScript printer support, NFS, Fluxbox and JWM window managers, games, system monitoring apps, a host of command line tools, USB support, and pcmcia support, some wireless support.

此外，DSL还支持从LiveCD上启动，u盘启动，从已经安装有系统的环境中启动等等，下面是详细列表

Boot from a business card CD as a live linux distribution (LiveCD)
Boot from a USB pen drive
Boot from within a host operating system (that's right, it can run *inside* Windows)
Run very nicely from an IDE Compact Flash drive via a method we call "frugal install"
Transform into a Debian OS with a traditional hard drive install
Run light enough to power a 486DX with 16MB of Ram
Run fully in RAM with as little as 128MB (you will be amazed at how fast your computer can be!)
Modularly grow -- DSL is highly extendable without the need to customize

A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux

A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux

(or, "Size Is Everything")

--------------------------------------------------------------------------------

She studied it carefully for about 15 minutes. Finally, she spoke. "There's something written on here," she said, frowning, "but it's really teensy."
[Dave Barry, "The Columnist's Caper"]

If you're a programmer who's become fed up with software bloat, then may you find herein the perfect antidote.

This document explores methods for squeezing excess bytes out of simple programs. (Of course, the more practical purpose of this document is to describe a few of the inner workings of the ELF file format and the Linux operating system. But hopefully you can also learn something about how to make really teensy ELF executables in the process.)

Please note that the information and examples given here are, for the most part, specific to ELF executables on a Linux platform running under an Intel-386 architecture. I imagine that a good bit of the information is applicable to other ELF-based Unices, but my experiences with such are too limited for me to say with certainty.

The assembly code that appears in this document is written for use with Nasm. (Besides being more appropriate for our needs, Nasm's syntax beats the hell out of AT&T syntax for anyone who learned x86 assembly language before learning to use Gas.) Nasm is freely available and extremely portable; see http://www.web-sites.co.uk/nasm/.

Please also note that if you aren't a little bit familiar with assembly code, you may find parts of this document sort of hard to follow.

--------------------------------------------------------------------------------

In order to start, we need a program. Almost any program will do, but the simpler the program the better, since we're more interested in how small we can make the executable than what the program does.

Let's take an incredibly simple program, one that does nothing but return a number back to the operating system. Why not? After all, Unix already comes with no less than two such programs: true and false. Since 0 and 1 are already taken, we'll use the number 42.

So, here is our first version:

/* tiny.c */
int main(void) { return 42; }

which we can compile and test like so:

$ gcc -Wall tiny.c
$ ./a.out ; echo $?
42

So. How big is it? Well, on my machine, I get:

$ wc -c a.out
3998 a.out

(Yours will probably differ some.) Admittedly, that's pretty small by today's standards, but it's almost certainly bigger than it needs to be.

The obvious first step is to strip the executable:

$ gcc -Wall -s tiny.c
$ ./a.out ; echo $?
42
$ wc -c a.out
2632 a.out

That's certainly an improvement. For the next step, how about optimizing?

$ gcc -Wall -s -O3 tiny.c
$ wc -c a.out
2616 a.out

That also helped, but only just. Which makes sense: there's hardly anything there to optimize.

It seems unlikely that there's much else we can do to shrink a one-statement C program. We're going to have to leave C behind, and use assembler instead. Hopefully, this will cut out all the extra overhead that C programs automatically incur.

So, on to our second version. All we need to do is return 42 from main(). In assembly language, this means that the function should set the accumulator, eax, to 42, and then return:

; tiny.asm
BITS 32
GLOBAL main
SECTION .text
main:
mov eax, 42
ret

We can then build and test like so:

$ nasm -f elf tiny.asm
$ gcc -Wall -s tiny.o
$ ./a.out ; echo $?
42

(Hey, who says assembly code is difficult?) And now how big is it?

$ wc -c a.out
2604 a.out

Looks like we shaved off a measly twelve bytes. So much for all the extra overhead that C automatically incurs, eh?

Well, the problem is that we are still incurring a lot of overhead by using the main() interface. The linker is still adding an interface to the OS for us, and it is that interface that actually calls main(). So how do we get around that if we don't need it?

The actual entry point that the linker uses by default is the symbol with the name _start. When we link with gcc, it automatically includes a _start routine, one that sets up argc and argv, among other things, and then calls main().

So, let's see if we can bypass this, and define our own _start routine:

; tiny.asm
BITS 32
GLOBAL _start
SECTION .text
_start:
mov eax, 42
ret

Will gcc do what we want?

$ nasm -f elf tiny.asm
$ gcc -Wall -s tiny.o
tiny.o(.text+0x0): multiple definition of `_start'
/usr/lib/crt1.o(.text+0x0): first defined here
/usr/lib/crt1.o(.text+0x36): undefined reference to `main'

No. Well, actually, yes it will, but first we need to learn how to ask for what we want.

It so happens that gcc recognizes an option called -nostartfiles. From the gcc info pages:

-nostartfiles
Do not use the standard system startup files when linking. The standard libraries are used normally.
Aha! Now let's see what we can do:

$ nasm -f elf tiny.asm
$ gcc -Wall -s -nostartfiles tiny.o
$ ./a.out ; echo $?
Segmentation fault
139

Well, gcc didn't complain, but the program doesn't work. What went wrong?

What went wrong is that we treated _start as if it were a C function, and tried to return from it. In reality, it's not a function at all. It's just a symbol in the object file which the linker uses to locate the program's entry point. When our program is invoked, it's invoked directly. If we were to look, we would see that the value on the top of the stack was the number 1, which is certainly very un-address-like. In fact, what is on the stack is our program's argc value. After this comes the elements of the argv array, including the terminating NULL element, followed by the elements of envp. And that's all. There is no return address on the stack.

So, how does _start ever exit? Well, it calls the exit() function! That's what it's there for, after all.

So, let's try this again. We're going to call exit(), which is a function that takes a single integer argument. So all we need to do is push the number onto the stack and call the function. (We also need to declare exit() as external.) Here's our assembly:

; tiny.asm
BITS 32
EXTERN exit
GLOBAL _start
SECTION .text
_start:
push dword 42
call exit

And we build and test as before:

$ nasm -f elf tiny.asm
$ gcc -Wall -s -nostartfiles tiny.o
$ ./a.out ; echo $?
42

Success at last! And now how big is it?

$ wc -c a.out
1340 a.out

Almost half the size! Not bad. Not bad at all. Hmmm ... so what other interesting obscure options does gcc have?

Well, this one, appearing immediately after -nostartfiles in the documentation, is certainly eye-catching:

-nostdlib
Don't use the standard system libraries and startup files when linking. Only the files you specify will be passed to the linker.
That's gotta be worth investigating:

$ gcc -Wall -s -nostdlib tiny.o
tiny.o(.text+0x6): undefined reference to `exit'

Oops. That's right ... exit() is, after all, a f
unction in the standard C library.

Maybe we should be calling _exit() instead? Well, you can try it, but you'll get the same results. _exit() isn't part of the standard C library, but it is still part of libc. It's a function, after all -- it has to be filled in from somewhere.

Okay. But surely, we don't need to use libc just to leave a program, do we?

No, we don't. If we're willing to leave behind all pretenses of portability, we can make our program exit without help from any libraries. First, though, we need to know how to make a system call under Linux.

--------------------------------------------------------------------------------

Linux, like most operating systems, provides basic necessities to the programs it hosts via system calls. This includes things like opening a file, reading and writing to file handles -- and, of course, shutting down a process.

The Linux system call interface is a single instruction: int 0x80. All system calls are done via this interrupt. To make a system call, eax should contain a number that indicates which system call is being invoked, and other registers are used to hold the arguments, if any. If the system call takes one argument, it will be in ebx; a system call with two arguments will use ebx and ecx. Likewise, edx, esi, and edi are used if a third, fourth, or fifth argument is required, respectively. Upon return from a system call, eax will contain the return value. If an error occurs, eax will contain a negative value, with the absolute value indicating the error.

The numbers for the different system calls are listed in /usr/include/asm/unistd.h. A quick peek will tell us that the exit system call is assigned the number 1. Like the C function, it takes one argument, the value to return to the parent process, and so this will go into ebx.

We now know all we need to know to create the next version of our program, one that won't need assistance from any external functions to work:

; tiny.asm
BITS 32
GLOBAL _start
SECTION .text
_start:
mov eax, 1
mov ebx, 42
int 0x80

Here we go:

$ nasm -f elf tiny.asm
$ gcc -Wall -s -nostdlib tiny.o
$ ./a.out ; echo $?
42

Ta-da! And the size?

$ wc -c a.out
372 a.out

Now that's tiny! Almost a fourth the size of the previous version!

So ... can we do anything else to make it even smaller?

How about using shorter instructions?

If we generate a list file for the assembly code, we'll find the following:

00000000 B801000000 mov eax, 1
00000005 BB2A000000 mov ebx, 42
0000000A CD80 int 0x80

Well, gee, we don't need to initialize all of ebx, since the operating system is only going to use the lowest byte. Setting bl alone will be sufficient, and will take two bytes instead of five.

We can also set eax to one by xor'ing it to zero and then using a one-byte increment instruction; this will save two more bytes.

00000000 31C0 xor eax, eax
00000002 40 inc eax
00000003 B32A mov bl, 42
00000005 CD80 int 0x80

I think it's pretty safe to say that we're not going to make this program any smaller than that.

As an aside, we might as well stop using gcc to link our executable, seeing as we're not using any of its added functionality, and just call the linker, ld, ourselves:

$ nasm -f elf tiny.asm
$ ld -s tiny.o
$ ./a.out ; echo $?
42
$ wc -c a.out
368 a.out

Four bytes smaller. (Hey! Didn't we shave five bytes off? Well, we did, but alignment considerations within the ELF file caused it to require an extra byte of padding.)

So ... have we reached the end? Is this as small as we can go?

Well, hm. Our program is now seven bytes long. Do ELF files really require 361 bytes of overhead? What's in this file, anyway?

We can peek into the contents of the file using objdump:

$ objdump -x a.out | less

The output may look like gibberish, but right now let's just focus on the list of sections:

Sections:
Idx Name Size VMA LMA File off Algn
0 .text 00000007 08048080 08048080 00000080 2**4
CONTENTS, ALLOC, LOAD, READONLY, CODE
1 .comment 0000001c 00000000 00000000 00000087 2**0
CONTENTS, READONLY

The complete .text section is listed as being seven bytes long, just as we specified. So it seems safe to conclude that we now have complete control of the machine-language content of our program.

But then there's this other section named ".comment". Who ordered that? And it's 28 bytes long, even! We may not be sure what this .comment section is, but it seems a good bet that it isn't a necessary feature....

The .comment section is listed as being located at file offset 00000087 (hexadecimal). If we use a hexdump program to look at that area of the file, we will see:

00000080: 31C0 40B3 2ACD 8000 5468 6520 4E65 7477 1.@.*...The Netw
00000090: 6964 6520 4173 7365 6D62 6C65 7220 302E ide Assembler 0.
000000A0: 3938 0000 2E73 796D 7461 6200 2E73 7472 98...symtab..str

Well, well, well. Who'd've thought that Nasm would undermine our quest like this? Maybe we should switch to using gas, AT&T syntax notwithstanding....

Alas, if we do:

; tiny.s
.globl _start
.text
_start:
xorl %eax, %eax
incl %eax
movb $42, %bl
int $0x80

... we will find:

$ gcc -s -nostdlib tiny.s
$ ./a.out ; echo $?
42
$ wc -c a.out
368 a.out

... no difference!

Well, actually there is some difference. Turning once again to objdump, we see:

Sections:
Idx Name Size VMA LMA File off Algn
0 .text 00000007 08048074 08048074 00000074 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
1 .data 00000000 0804907c 0804907c 0000007c 2**2
CONTENTS, ALLOC, LOAD, DATA
2 .bss 00000000 0804907c 0804907c 0000007c 2**2
ALLOC

No comment section, but now we have two useless sections for storing our nonexistent data. And even though these sections are zero bytes long, they incur overhead, bringing our file size up for no good reason.

Okay, so just what is all this overhead, and how do we get rid of it?

Well, to answer these questions, we must begin diving into some real wizardry. We need to understand the ELF format.

--------------------------------------------------------------------------------

The canonical document describing the ELF format for Intel-386 architectures can be found at ftp://tsx.mit.edu/pub/linux/packages/GCC/ELF.doc.tar.gz. If you'd rather not muck around with Postscript documents, you can find a flat-text version at http://www.muppetlabs.com/~breadbox/software/ELF.txt. This specification covers a lot of territory, so if you'd prefer to not read the whole thing yourself, I'll understand. Basically, here's what we need to know:

Every ELF file begins with a structure called the ELF header. This structure is 52 bytes long, and contains several pieces of information that describe the
contents of the file. For example, the first sixteen bytes contain an "identifier", which includes the file's magic-number signature (7F 45 4C 46), and some one-byte flags indicating that the contents are 32-bit or 64-bit, little-endian or big-endian, etc. Other fields in the ELF header contain information such as: the target architecture; whether the ELF file is an executable, an object file, or a shared-object library; the program's starting address; and the locations within the file of the program header table and the section header table.

These two tables can appear anywhere in the file, but typically the former appears immediately following the ELF header, and the latter appears at or near the end of the file. The two tables serve similar purposes, in that they identify the component parts of the file. However, the section header table focuses more on identifying where the various parts of the program are within the file, while the program header table describes where and how these parts are to be loaded into memory. In brief, the section header table is for use by the compiler and linker, while the program header table is for use by the program loader. The program header table is optional for object files, and in practice is never present. Likewise, the section header table is optional for executables -- but is almost always present!

So, this is the answer to our first question. A fair piece of the overhead in our program is a completely unnecessary section header table, and maybe some equally useless sections that don't contribute to our program's memory image.

So, we turn to our second question: how do we go about getting rid of all that?

Alas, we're on our own here. None of the standard tools will deign to make an executable without a section header table of some kind. If we want such a thing, we'll have to do it ourselves.

This doesn't quite mean that we have to pull out a binary editor and code the hexadecimal values by hand, though. Good old Nasm has a flat binary output format, which will serve us well. All we need now is the image of an empty ELF executable, which we can fill in with our program. Our program, and nothing else.

We can look at the ELF specification, and /usr/include/linux/elf.h, and executables created by the standard tools, to figure out what our empty ELF executable should look like. But, if you're the impatient type, you can just use the one I've supplied here:

BITS 32

org 0x08048000

ehdr: ; Elf32_Ehdr
db 0x7F, "ELF", 1, 1, 1 ; e_ident
times 9 db 0
dw 2 ; e_type
dw 3 ; e_machine
dd 1 ; e_version
dd _start ; e_entry
dd phdr - $$ ; e_phoff
dd 0 ; e_shoff
dd 0 ; e_flags
dw ehdrsize ; e_ehsize
dw phdrsize ; e_phentsize
dw 1 ; e_phnum
dw 0 ; e_shentsize
dw 0 ; e_shnum
dw 0 ; e_shstrndx

ehdrsize equ $ - ehdr

phdr: ; Elf32_Phdr
dd 1 ; p_type
dd 0 ; p_offset
dd $$ ; p_vaddr
dd $$ ; p_paddr
dd filesize ; p_filesz
dd filesize ; p_memsz
dd 5 ; p_flags
dd 0x1000 ; p_align

phdrsize equ $ - phdr

_start:

; your program here

filesize equ $ - $$

This image contains an ELF header, identifying the file as an Intel 386 executable, with no section header table and a program header table containing one entry. Said entry instructs the program loader to load the entire file into memory (it's normal behavior for a program to include its ELF header and program header table in its memory image) starting at memory address 0x08048000 (which is the default address for executables to load), and to begin executing the code at _start, which appears immediately after the program header table. No .data segment, no .bss segment, no commentary -- nothing but the bare necessities.

So, let's add in our little program:

; tiny.asm
org 0x08048000

;
; (as above)
;

_start:
mov bl, 42
xor eax, eax
inc eax
int 0x80

filesize equ $ - $$

and try it out:

$ nasm -f bin -o a.out tiny.asm
$ chmod +x a.out
$ ./a.out ; echo $?
42

We have just created an executable completely from scratch. How about that? And now, take a look at its size:

$ wc -c a.out
91 a.out

Ninety-one bytes. Less than one-fourth the size of our previous attempt, and less than one-fortieth the size of our first!

What's more, this time we can account for every last byte. We know exactly what's in the executable, and why it needs to be there. This is, finally, the limit. We can't get any smaller than this.

Or can we?

--------------------------------------------------------------------------------

Well, if you actually stopped to read the ELF specification, you might have noticed a couple of facts. 1) The different parts of an ELF file are permitted to be located anywhere (except the ELF header, which must be at the top of the file), and they can even overlap each other. 2) Some of the fields in the headers aren't actually used.

In particular, I'm thinking of those nine bytes of zeros at the end of the 16-byte identification field. They are pure padding, to make room for future expansion of the ELF standard. So the OS shouldn't care at all what's in there. And we're already loading everything into memory anyway, and our program is only seven bytes long....

Can we put our code inside the ELF header itself?

Why not?

; tiny.asm

BITS 32

org 0x08048000

ehdr: ; Elf32_Ehdr
db 0x7F, "ELF" ; e_ident
db 1, 1, 1, 0
_start: mov bl, 42
xor eax, eax
inc eax
int 0x80
db 0
dw 2 ; e_type
dw 3 ; e_machine
dd 1 ; e_version
dd _start ; e_entry
dd phdr - $$ ; e_phoff
dd 0 ; e_shoff
dd 0 ; e_flags
dw ehdrsize ; e_ehsize
dw phdrsize
; e_phentsize
dw 1 ; e_phnum
dw 0 ; e_shentsize
dw 0 ; e_shnum
dw 0 ; e_shstrndx

ehdrsize equ $ - ehdr

phdr: ; Elf32_Phdr
dd 1 ; p_type
dd 0 ; p_offset
dd $$ ; p_vaddr
dd $$ ; p_paddr
dd filesize ; p_filesz
dd filesize ; p_memsz
dd 5 ; p_flags
dd 0x1000 ; p_align

phdrsize equ $ - phdr

filesize equ $ - $$

After all, bytes are bytes!

$ nasm -f bin -o a.out tiny.asm
$ chmod +x a.out
$ ./a.out ; echo $?
42
$ wc -c a.out
84 a.out

Not bad, eh?

Now we've really gone as low as we can go. Our file is exactly as long as one ELF header and one program header table entry, both of which we absolutely require in order to get loaded into memory and run. So there's nothing left to reduce now!

Except ...

Well, what if we could do the same thing to the program header table that we just did to the program? Have it overlap with the ELF header, that is. Is it possible?

It is indeed. Take a look at our program. Note that the last eight bytes in the ELF header bear a certain kind of resemblence to the first eight bytes in the program header table. A certain kind of resemblence that might be described as "identical".

So ...

; tiny.asm

BITS 32

org 0x08048000

ehdr:
db 0x7F, "ELF" ; e_ident
db 1, 1, 1, 0
_start: mov bl, 42
xor eax, eax
inc eax
int 0x80
db 0
dw 2 ; e_type
dw 3 ; e_machine
dd 1 ; e_version
dd _start ; e_entry
dd phdr - $$ ; e_phoff
dd 0 ; e_shoff
dd 0 ; e_flags
dw ehdrsize ; e_ehsize
dw phdrsize ; e_phentsize
phdr: dd 1 ; e_phnum ; p_type
; e_shentsize
dd 0 ; e_shnum ; p_offset
; e_shstrndx
ehdrsize equ $ - ehdr
dd $$ ; p_vaddr
dd $$ ; p_paddr
dd filesize ; p_filesz
dd filesize ; p_memsz
dd 5 ; p_flags
dd 0x1000 ; p_align
phdrsize equ $ - phdr

filesize equ $ - $$

And sure enough, Linux doesn't mind our parsimony one bit:

$ nasm -f bin -o a.out tiny.asm
$ chmod +x a.out
$ ./a.out ; echo $?
42
$ wc -c a.out
76 a.out

Now we've really gone as low as we can go. There's no way to overlap the two structures any more than this. The bytes simply don't match up. This is the end of the line!

Unless, that is, we could change the contents of the structures to make them match even further....

How many of these fields is Linux actually looking at, anyway? For example, does Linux actually check to see if the e_machine field contains 3 (indicating an Intel 386 target), or is it just assuming that it does?

As a matter of fact, in that case it does. But a surprising number of other fields are being quietly ignored.

So: Here's what is and isn't essential in the ELF header. The first four bytes have to contain the magic number, or else Linux won't touch it. The other three bytes in the e_ident field are not checked, however, which means we have no less than twelve contiguous bytes we can set to anything at all. e_type has to be set to 2, to indicate an executable, and e_machine has to be 3, as just noted. e_version is, like the version number inside e_ident, completely ignored. (Which is sort of understandable, seeing as currently there's only one version of the ELF standard.) e_entry naturally has to be valid, since it points to the start of the program. And clearly, e_phoff needs to contain the correct offset of the program header table in the file, and e_phnum needs to contain the right number of entries in said table. e_flags, however, is documented as being currently unused for Intel, so it should be free for us to reuse. e_ehsize is supposed to be used to verify that the ELF header has the expected size, but Linux pays it no mind. e_phentsize is likewise for validating the size of the program header table entries. This one is checked, but only in 2.2 kernels after version 2.2.17. Earlier 2.2 kernels ignored it, as does 2.4.0. And everything else in the ELF header is about the section header table, which doesn't come into play with executable files.

And now how about the program header table entry? Well, p_type has to contain 1, to mark it as a loadable segment. p_offset really needs to have the correct file offset to start loading. Likewise, p_vaddr needs to contain the proper load address. Note, however, that we're not required to load at 0x08048000. Almost any address can be used as long as it's above 0x00000000, below 0x80000000, and page-aligned. The p_paddr field is documented as being ignored, so that's automatically free. p_filesz indicates how many bytes to load out of the file into memory, and p_memsz indicates how large the memory segment needs to be, so these numbers ought to be relatively sane. p_flags indicates what permissions to give the memory segment. It needs to be readable (4), or it won't be usable at all, and it needs to also be executable (1), or else we can't execute code in it. Other bits can probably be set as well, but we need to have those at minimum. Finally, p_align gives the alignment requirements for the memory segment. This field is mainly used when relocating segments containing position-independent code (as for shared libraries), so for an executable file Linux will ignore whatever garbage we store here.

All in all, that's a fair bit of leeway. In particular, a bit of scrutiny will reveal that most of the necessary fields in the ELF header are in the first half - the second half is almost completely free for munging. With this in mind, we can interpose the two structures quite a bit more than we did previously:

; tiny.asm

BITS 32

org 0x00200000

db 0x7F, "ELF" ; e_ident
db 1, 1, 1, 0
_start:
mov bl, 42
xor eax, eax
inc eax
int 0x80
db 0
dw 2 ; e_type
dw 3 ; e_machine
dd
1 ; e_version
dd _start ; e_entry
dd phdr - $$ ; e_phoff
phdr: dd 1 ; e_shoff ; p_type
dd 0 ; e_flags ; p_offset
dd $$ ; e_ehsize ; p_vaddr
; e_phentsize
dw 1 ; e_phnum ; p_paddr
dw 0 ; e_shentsize
dd filesize ; e_shnum ; p_filesz
; e_shstrndx
dd filesize ; p_memsz
dd 5 ; p_flags
dd 0x1000 ; p_align

filesize equ $ - $$

As you can (hopefully) see, the first twenty bytes of the program header table now overlap the last twenty bytes of the ELF header. The two dovetail quite nicely, actually. There are only two parts of the ELF header within the overlapped region that matter. The first is the e_phnum field, which just happens to coincide with the p_paddr field, one of the few fields in the program header table which is definitely ignored. The other is the e_phentsize field, which coincides with the top half of the p_vaddr field. These are made to match up by selecting a non-standard load address for our program, with a top half equal to 0x0020.

Now we have really left behind all pretenses of portability ...

$ nasm -f bin -o a.out tiny.asm
$ chmod +x a.out
$ ./a.out ; echo $?
42
$ wc -c a.out
64 a.out

... but it works! And the program is twelve bytes shorter, exactly as predicted.

This is where I say that we can't do any better than this, but of course, we already know that we can -- if we could get the program header table to reside completely within the ELF header. Can this holy grail be achieved?

Well, we can't just move it up another twelve bytes without hitting hopeless obstacles trying to reconcile several fields in both structures. The only other possibility would be to have it start immediately following the first four bytes. This puts the first part of the program header table comfortably within the e_ident area, but still leaves problems with the rest of it. After some experimenting, it looks like it isn't going to quite be possible.

However, it turns out that there are still a couple more fields in the program header table that we can pervert.

We noted that p_memsz indicates how much memory to allocate for the memory segment. Obviously it needs to be at least as big as p_filesz, but there wouldn't be any harm if it was larger....

Secondly, it turns out that, contrary to every expectation, the executable bit can be dropped from the p_flags field, and Linux will set it for us anyway. Why this works, I honestly don't know -- maybe because Linux sees that the entry point goes to this segment? In any case, it works.

So, with these facts in mind, we can reorganize the file into this little monstrosity:

; tiny.asm

BITS 32

org 0x00001000

db 0x7F, "ELF" ; e_ident
dd 1 ; p_type
dd 0 ; p_offset
dd $$ ; p_vaddr
dw 2 ; e_type ; p_paddr
dw 3 ; e_machine
dd filesize ; e_version ; p_filesz
dd _start ; e_entry ; p_memsz
dd 4 ; e_phoff ; p_flags
_start:
mov bl, 42 ; e_shoff ; p_align
xor eax, eax
inc eax ; e_flags
int 0x80
db 0
dw 0x34 ; e_ehsize
dw 0x20 ; e_phentsize
dw 1 ; e_phnum
dw 0 ; e_shentsize
dw 0 ; e_shnum
dw 0 ; e_shstrndx

filesize equ $ - $$

The p_flags field has been changed from 5 to 4, as we noted we could get away with doing. This 4 is also the value of the e_phoff field, which gives the offset into the file for the program header table, which is exactly where we've located it. The program (remember that?) has been moved down to lower part of the ELF header, beginning at the e_shoff field and ending inside the e_flags field.

Note that the load address has been changed to a much lower number -- as low as it can be, in fact. This keeps the value in the e_entry field to a reasonably small number, which is good since it's also the p_memsz number. (Actually, with virtual memory it hardly matters -- we could have left it at our original value and it would probably work just as well. But there's no harm in being polite.)

And so ...

$ nasm -f bin -o a.out tiny.asm
$ chmod +x a.out
$ ./a.out ; echo $?
42
$ wc -c a.out
52 a.out

... and so, with both the program header table and the program itself completely embedded within the ELF header, our executable file is now exactly as big as the ELF header! No more, no less. And still running without a single complaint from Linux!

Now, finally, we have truly and certainly reached the absolute minimum possible. There can be no question about it, right? After all, we have to have a complete ELF header (even if it is badly mangled), or else Linux wouldn't give us the time of day!

Right?

Wrong. We have one last dirty trick left.

It seems to be the case that if the file isn't quite the size of a full ELF header, Linux will still play ball, and fill out the missing bytes with zeros. We have no less than seven zeros at the end of our file, and if we drop them from the file image:

; tiny.asm

BITS 32

org 0x00001000

db 0x7F, "ELF" ; e_ident
dd 1 ; p_type
dd 0 ; p_offset
dd $$ ; p_vaddr
dw 2 ; e_type ; p_paddr
dw 3 ; e_machine
dd filesize ; e_version ; p_filesz
dd _start ; e_entry ; p_memsz
dd 4 ; e_phoff ; p_flags
_start:
mov bl, 42 ; e_shoff ; p_align
xor eax, eax
inc eax ; e_flags
int 0x80
db 0
dw 0x34 ; e_ehsize
dw 0x20 ; e_phentsize
db 1 ; e_phnum
; e_shentsize
; e_shnum
; e_shstrndx

filesize equ $ - $$

...
we can, incredibly enough, still produce a working executable:

$ nasm -f bin -o a.out tiny.asm
$ chmod +x a.out
$ ./a.out ; echo $?
42
$ wc -c a.out
45 a.out

Here, at last, we have honestly gone as far as we can go. There is no getting around the fact that the 45th byte in the file, which specifies the number of entries in the program header table, needs to be non-zero, needs to be present, and needs to be in the 45th position from the start of the ELF header. We are forced to conclude that there is nothing more that can be done.

--------------------------------------------------------------------------------

This forty-five-byte file is less than one-eighth the size of the smallest ELF executable we could create using the standard tools, and is less than one-fiftieth the size of the smallest file we could create using pure C code. We have stripped everything out of the file that we could, and put to dual purpose most of what we couldn't.

Of course, half of the values in this file violate some part of the ELF standard, and it's a wonder than Linux will even consent to sneeze on it, much less give it a process ID. This is not the sort of program to which one would normally be willing to confess authorship.

On the other hand, every single byte in this executable file can be accounted for and justified. How many executables have you created lately that you can say that about?

--------------------------------------------------------------------------------

Some Final Breezes (A Postscript)

Tiny
Software
Brian Raiter
Muppetlabs

2007年4月4日星期三

越狱回顾

陆续多找些回顾的视频贴上来

片中曲叫Mad World，这里有这首歌曲的中英文歌词，是《战争机器》的主题曲

Prison break III returns this fall

google（谷歌）拼音输入法试用

图标可以自由伸缩，属性设置面板跟sogou输入法很类似，选词准确率发现比sogou的高，在设置里面登录自己的google帐号之后可以将自己的词库上传到自己的google账号中，到了其他机器上使用输入法的时候可以再将词库同步下来，这个是sogou输入法所没有的。

输入几首诗看看：上面的是谷歌拼音输入法，下面的使用sogou拼音输入法

抽刀断水水更流，举杯消愁愁更愁
抽刀断水水更流，具备小丑丑更丑

sogou输入的有错

古人西祠黄鹤楼，烟花三月下扬州，孤帆远影碧空尽，唯见长江天际流
故人西辞黄鹤楼，烟花三月下扬州，孤帆远影碧空尽，未见长江天际流

都有错

春蚕到死丝方尽，蜡炬成灰泪始干
春蚕到死丝方尽，蜡炬成灰泪始干

都正确

期待越狱第三季

Michael又换了新窝--SONA，不用再每天东躲西藏了，但是环境令人堪忧。所以第三季毫无疑问的就是break out, and then run. Mahone也被关到了同一个地方，能否出来就看编剧了。Sara的扮演着Sarah怀孕了，第三季会怎样还是个迷。

2007年4月3日星期二

中国人民公安大学2007年研究生复试分数线(名单)放出

放到了google doc上，大概看看分数，知道自己努力的方向。

继续Linux内核的旅行

工作稳定下来了，虽然工资不高，但是工作的很开心，空余时间也相对多了许多，先把等级考试应付过去，然后就继续两年前的Linux内核阅读的工作，还是从0.11入手，边读边实验。

内容就发到博客里面。

2007年4月2日星期一

What’re ya doing?

Twitter，可以理解为博客的废话版本，当你想记录下来正在做的事情的时候就会用到twitter了，简单的几句话，表达下自己的心情，n年后回来看看说不定也是一笔财富。对中文还不支持目前。

看看我在忙啥子

2007年4月1日星期日

被忽悠了

还没起床就同学用短信被忽悠一次，起床后发现空间又无法访问了，再次被忽悠一次。

Have you been fooled today?
今天你被忽悠了吗？

订阅：博文 (Atom)