Zhengziying's Blog: November 2015 Archives

双职工生活

Nov 30, 2015 3 Comments

郑轶嘉五个月大的时候就送去托儿所了。

我们也是不得已，没有其它的办法。我们两个把公司的产假都用足了，朱逢霖还额外请了一个月无薪的产假，这样才对付了一开始的五个月。因为种种原因，我和朱逢霖的父母都没法来美国帮忙带孩子。只有朱逢霖的妈妈来过，但朱逢霖出了月子她就回国去了。那之后的四个月就只有我们两个硬扛着。

等我们两个都回去上班了，就只能把郑轶嘉送day care了。不心疼是不可能的，才五个月大啊，都还没断奶。奶都是朱逢霖泵出来冻在冰箱里，每天早上拿几包出来带去day care，她们白天化冻了热一下给郑轶嘉喝。郑轶嘉在这家day care待了一个月，我们找到了一个姓杨的住家阿姨，就把郑轶嘉放家里了。

主要原因是郑轶嘉在那家day care睡的不好。那家day care的条件不能和Bright Horizons的比。Bright Horizons有专门的infant的一大间教室，到了nap的时候关上门，里面安安静静地。那家day care就开在一间普通民宅里，里面大大小小的小孩都有。郑轶嘉那时候每天还是要睡三觉的。一间房间里大一点的小孩在玩，另一间里郑轶嘉睡觉多少是受影响的。

我和朱逢霖都特别重视郑轶嘉的睡眠。我们相信，成年以后的很多睡眠问题的源头来自于婴儿阶段的神经系统的发育和睡眠习惯、睡眠能力的养成。我们也相信，好的睡眠质量能让小孩的更有专注力。相比之下，我们对他什么时候会爬、会坐、会走、会说话、会自己potty等等相对没有那么的在意。

2015-11-30-blog-yijia-nap

杨阿姨在我们家做了一年，做到郑轶嘉一岁半的时候。这中间朱逢霖的妈妈来过一次，待了几个月。朱逢霖的妈妈和杨阿姨差不多是同时走的，那时候Bright Horizons也正好有空位了。杨阿姨和朱逢霖的妈妈刚刚走的时候，我们一下子好不习惯。有了对比才深深的觉得她们在的时候生活好轻松的。至少家务活都不用做了：不用烧饭，不用洗衣服叠衣服，不用地毯吸尘。每天到家都有热乎乎的现成饭吃，脏衣服总是会洗得干干净净叠得整整齐齐的回到衣柜里去，水池里的脏碗也好像会自动的就干干净净的回到碗柜里去。

从那时到现在，两年多了，我和朱逢霖就一直是双职工生活。郑轶嘉这个月四岁了。这两年多里，我们仅有的帮手就是请了个钟点工，一个月四百，每周来两次，每次一两个小时，主要就是干些低频的家务，比如擦地板、吸地毯、清洁浴室和厕所什么的。要说不羡慕国内的人那是不可能的，国内的钟点工的工钱相对于我们在国内的工资来说便宜太多了，我们可以请一个小时工每天来做家务，还能把饭烧了。朱逢霖认识的一个印度人就因为这个原因回印度去了：在印度他能请一堆佣人。

家务都自己做，做做也就习惯了。而且还相对有点好处。比如郑轶嘉在厨房吃早饭的时候我就在边上顺便把洗碗机给unload了。这样总比郑轶嘉在吃早饭我在边上看手机好。郑轶嘉也学会了喝完牛奶把杯子直接放到水池里面。洗衣服叠衣服也变成了一项亲子活动。郑轶嘉特别喜欢帮我们把衣服从washer搬到dryer里面，还喜欢把dryer的门关上然后按"开始"按钮。干好的衣服我们经常喊郑轶嘉一起来叠。一开始是让他负责分类：把爸爸、妈妈和嘉嘉的衣服分来，各自堆成一堆。后来他看着我们叠衣服的样子也学着叠，喜欢叠自己的袜子，还不让我们插手。如果这些家务都被钟点工或者过来帮忙带孩子的爷爷奶奶给做了，小孩倒也就没有这样参与的机会了。

2015-11-30-blog-yijia-socks

其实在美国的中国人家庭，大部分的都有父母过来帮忙的。很多家里是爷爷奶奶半年，接着外公外婆半年，然后再爷爷奶奶半年，这样连续不断没有间隔的。还有些家虽然不是不间断的，至少也是每年有半年是有一方的父母过来帮忙的。说不羡慕那是不可能的。有爷爷奶奶或外公外婆在，很多时候可以轻松很多。我和朱逢霖经常会生出这样的感慨来：出去吃饭，有老人在的话可以帮忙看一下，我们自己可以吃顿安稳饭；晚上家里要是有个老人在，我们两个就可以出去，听个音乐会看个球赛什么的。虽然federal law和华盛顿州的法律都没有强制规定，但美国这边的惯例是不可以把12岁以下的小孩单独留在家里的。如果被发现被举报，最坏情况下，小孩是会被带走的。虽然说可以请babysitter，但一方面请一个晚上babysitter也要好几十块钱，另一方面babysitter毕竟不是亲人，小孩小的时候突然要跟一个陌生人待一个晚上，心理上还是满难的。包括那次去Las Vegas玩，晚上我们也只能留一个人在房间陪郑轶嘉，另一个人去看show。

所以没有父母在这边帮忙，夫妻两个会少了很多单独相处的时间，时间久了的确会有种感觉，就觉得自己整天不是忙工作就是围着孩子转。我和朱逢霖很早就预见到和意识到这个问题了，我们想了一些办法来弥补。比如说，我们约好每个月要找一个下午一起翘班出来，吃顿好吃的午饭，看场电影。郑轶嘉上的那家幼儿园，Bright Horizons，也有一个Parents Night Out的项目：每隔两三个月，这家幼儿园都会选一个周六，从下午四点到晚上十点，家长可以把小孩放在他们那里，他们提供小朋友晚饭，还配他们玩。因为是小朋友平时天天都去的幼儿园，老师也是平时的熟面孔，所以没有陌生感。我们觉得这个项目还挺好的。

双职工家庭没有老人帮忙，工作日的晚饭是个难题。我和朱逢霖后来摸索出来一套对我们家效果不错的做法。

首先，我们的电饭煲是可以定时的。早上出门前把米和水放好，定时定在17:30开始煮饭，这样到家就有新鲜出炉的热腾腾的米饭吃了。这样要好过早上时就把饭煮好，那样的话要保温保一天，到晚上吃的时候口感就不大好了。如果等到晚上到家再开始煮饭，那吃到饭就要很晚了。所以，能定时的电饭煲是双职工家必备的一个神器。另一个神器是慢炖锅。慢炖锅可以炖牛肉羊肉鸡肉什么的。一方面慢炖锅炖的比煎炒出来的要健康一些，吃着健康，油烟也少，另一方面慢炖锅能把肉炖酥了，否则如果要做个牛肉炖土豆，等到回到家再做，要么是煮不烂，要么就得等很久才能开饭。

除了使用可以定时的电饭煲和慢炖锅以外，为了能到家后尽快能吃上晚饭，我们家的菜谱也因此优化了，都是以容易准备容易烧的菜，但同时也兼顾了口味，使郑轶嘉有食欲，能多吃一些。我们做的比较多的是鱼。鱼容易做。我们一般早上出门前从冷冻室里拿一条鳊鱼或一袋带鱼出来，放在冰箱上层的冷藏室里化冻。这样做的好处是不用晚上到家再化冻，否则要么要花很多时间，要么就要用微波炉化冻。我们都觉得微波炉还是要尽量少用。鱼在冷藏室里化冻了以后，晚上我们一到家，第一件事情就是把鱼给蒸上，然后再搞蔬菜。蒸鱼不需要太多关注。煎炒的菜，时间长了会烧焦。蒸的时间稍微多了一点也问题不大，只要锅里水足够不蒸干掉，鱼是不会蒸焦掉的。

基本上我们现在每周五天工作日，一般在家里吃四天，到了星期五会在外面吃。我们基本不买外卖。我们是觉得外卖的东西不放心。倒也不是担心食品安全。美国的食品安全总体来说比国内的要好一点。但饭店里烧出来的菜，重油重盐的，不健康，能少吃点尽量少吃点。我和朱逢霖在这方面的观念是很相似的。我们也都很喜欢吃那些很好吃但不太健康的东西，比如烤羊肉串，小龙虾，水煮鱼，红烧肉，火锅，腌笃鲜，clam chowder，牛排。但每个人就只有这么一点点quota可以吃不健康的食物，超过quota就会三高就会影响健康。所以我们觉得把quota用在外卖的晚餐上是不太划算的。我们现在晚饭自己烧，牛肉都从Whole Foods买有机的（我们家基本不吃猪肉），鸡蛋无论是在Whole Foods、QFC还是Costco买都是买有机的，蔬菜也尽量是有机的，烧的时候少盐少油少高温。这两年我和朱逢霖体检的各项指标都正常，自己做饭做的比较健康是原因之一。

其实没有老人在这边帮忙，累是累了一点，不过也省掉了一些其它的烦恼。经常听身边的人说，也经常听朱逢霖说她在华人或mitbbs上看到，老人在这边帮忙带小孩，和孩子爸妈之间起了观念冲突。另外一些有老人帮忙带小孩的家庭里，老人太宠小孩了，小孩养成了一些不好的习惯，比如老人追着小孩喂饭之类的。这其实并不是在美国的中国人家庭独有的。在国内，和老人在同一个城市的，老人经常来帮忙的，也是类似问题的情况。我们家有点"因祸得福"的是，因为老人都来不了，也就不存在这些困扰了。

因为没有老人帮忙，郑轶嘉上托儿所也上得比较早，一岁半就上托儿所了。很多有老人帮忙的家庭，一般会等到两岁或三岁才送托儿所。我记得看到过一份研究报告，说小孩在一岁到两岁之间开始上幼儿园是最有利于小孩的社交能力和心智发育的。送托儿所送的晚的，可能就少了很多学习怎么和其他小朋友互动的机会。送幼儿园送的晚，一开始的几个礼拜适应起来也会更难一些。

双职工没老人帮忙，累是挺累的。不过在美国这边的中国人家庭，还有不少是只有一个人工作的。少一份收入，也挺累的。最幸福的当然是两份收入，还有老人过来帮忙。不过生活就像打牌，抓到手里的牌有好有坏，如果已经不能换牌了，那就用心把手里的牌打好。

After Automation Ate Testing

Nov 24, 2015 0 Comments

Huseyin Dursun, my previous manager, recently wrote a post “Automation eats everything ...”, in which he pointed out that manual validation has been eliminated and technology companies are no longer hiring engineers exclusively for testing role. That’s exactly what happened last year in my group, Microsoft Azure. We eliminated test and redefined dev and now we only have software engineers, who write both product code and test code.

Now we have eliminated manual validation and all tests are automated. What's next? My answer is: more automation. Here is a few areas that I see where we are/will be replacing other human work in the engineering activities with software programs.

1. Automation of writing test automation

Today, test automations are written by engineers. In the future, test automation will be written by software programs. In other words, engineers will write the code which writes test automation. One technique to consider is the model based testing. The idea of MBT has existed for nearly two decades and some companies (including teams in Microsoft, including my own teams) have tried and have got some successes. But by and large, it's very under-used, mainly because other things aren't there yet, like the scale, the demand, the maturity in other engineering activities^[1], the people, etc..

Another direction that people have been pursuing for at least a decade is the traffic bifurcation. The idea is to run the test instance as a shadow copy of the production instance, duplicate the production traffic to the shadow copy and see if it handles it in the same way as the production copy does. The bifurcation could be real time, or more in a record-and-replay fashion. Twitter’s Diffy is the latest work that I have seen in this direction. I guess there is a long way to go, especially when the SUT is very much stateful and its state has strong dependencies with the states in other downstream systems.

2. Behavioral contract enforcement

Using contracts to define system boundary and doing implementation against contracts is now very common. However, our contracts are mostly about the data schema: the API signature, the structure of the JSON object in the input parameters and response bodies, the RESTful API URL, the WSDL for XML Web Services, file format, response codes and error codes, ... These contracts don't carry much information about the behaviors: how will the entity transit through its state machine, whether an operation is going to be idempotent, whether I must call connection.Open() before doing anything else with it, etc.. In particular, the behaviors related to time. For example, this asynchronous operation is supposed to complete within N minutes; the system will perform this recurring operation every X days; ...

Today the behavioral contracts are mostly written (if ever written) in our natural languages in design specifications. The enforcement of such behavioral contracts are done in automated test cases. But there could be some fatal gaps in today's way. Our natural language is ambiguous. Test cases may not cover 100% what's written in and implied by the design specification. A more fundamental challenge is that the intention of the automated test cases may drift away as time goes by, meaning: our test automation code use to be able to catch a code bug, but after test code changes and refactoring, one day it will no longer be able to catch the same bug. I don't think we have a good way to detect and prevent such drift.

I believe the direction is to write the behavioral contract with some formal language, such as the TLA+ specification language created by Leslie Lamport. In a presentation last year, he explained how TLA+ works and how it's used in some real work. It seems pretty intriguing.

3. Automation of the analysis

In my previous team, as we made the automated tests faster, we found that now the long pole became the time human spent to make sense of the test result. So we developed some algorithms and tools to help us: 1) differentiate whether a failure is a new regression, or just a flaky test, 2) which failed tests are likely to share the same root cause. That was very helpful. In addition, we plan was to totally get rid of signoffs and let the software programs to make the call most of the time.

4. Automation of the workflow

Ideally once my code has left my desktop, the entire desktop-to-production journey should be led by software programs with no human participation (except for intervention/override). Today some companies are closer to that dream (e.g. Netflix's Spinnaker) and some other companies are farther away. Some smaller/simpler products may have already achieved it, but it remains a challenging thing for complex products. Today CI/CD is a lot more common in the software industry than ten years ago. But in my eyes today's CI/CD tools and practices more like the DHTML and AJAX things circa early 2000's. The jQuery/Bootstrap equivalent in CI/CD has yet to come.

5. Integration test in production

Besides replacing more human work with software programs, there is one more thing that we can do better in the test engineering: eliminate the test environment per se and perform all integration tests in production^[2]. Integration test is an inevitable^[3] phase between passing unit tests and getting exposed to real customers in production. Traditionally in integration tests, the SUT and most of its dependencies runs in the lab that are physically separated from the production instances. There are several big pain points in that approach: a) fidelity^[5], b) capacity, c) stability, d) support^[6]. Doing integration tests in production will make all these problems disappear. Needless to say, there are some challenges in this, mainly regarding product architect, security and compliance, isolation and protection, differentiation and equality, monitoring and alerting, etc.. I guess next time I will write a post about "The Design Pattern of Integration Testing in Production".

[1] For example, a team should invest in other more fundamental things like CI/CD before investing in building the model and doing MBT.
[2] "Testing in production" is a highly overloaded term. Someone uses it to refer to A/B testing. Sometime it means a late stage quality gate where the new version is rolled out to a small % of production and/or exposed to a small % of customers. "Integration test in production" is different on two things: i) it's for low quality code that is still under development, ii) it doesn't get exposed to customer.
[3] There are some strong opinions against integration tests. The lines like “integration test is a scam” help highlight some valid points. But practically we shouldn't throw the baby out with the bath water. I am strong believer of "pushing to the left" (meaning: put more tests in unit test and find issues earlier) but I too believe integration test has its place in the outer loop^[4]. Even though in the hindsight it might be very obvious that some bugs could have been caught by unit test, it would be a totally different thing when these bugs were unknown unknown.
[4] Outer Loop is defined as the stage between when an engineer has completed their check in and when it has rolled out to production. Depending on the product, this could mean App Store deployments (Mobile) or worldwide exposure (Services and modern Click to Run applications).
[5] Lab is different than production in many ways: configurations, security settings, networking, data pattern, etc. Those differences often hide bugs. Lab doesn't have all the hardware SKUs that production has, which significantly limits how much we can do in the lab in hardware related testing (e.g. drivers, I/O performance, etc.).
[6] Let's say the SUT depends on another service Foo. So traditionally in the integration test, we also have Foo instance(s) running in lab, too. When the lab instance(s) of Foo has any issue, the team of SUT will need the team of Foo to help check/fix. But that would be a lower priority for the team Foo, compared to the issues in the live site (production). Plus, the SLA (service level agreement) for lab instances is usually less than 24x7, but we want our integration tests to run all the time.

The Combined Engineering in Azure: A Year Later

Nov 21, 2015 0 Comments

Last year in Windows Azure^[1], we merged dev and test^[2] and switched to the combined engineering model^[3].

Recently I have been asked quite a few times about my view of that change. My answer was: it solved a few chronic problems in the traditional dev+test model. It solved these problems fairly easily and naturally. If we didn't do the combined engineering change, these problems would still be here today:

1. Quality is everyone's responsibility

We always said: quality is owned by everybody, not just the test team. In the reality, there were always some gaps, more or less. Some developers still had the mentality of "the test team would/should find the bug for me". Now there is no test team. Software engineers can count on nobody but themselves.

2. Improve testability

Although nobody disagreed with the importance of testability design, often times testability is treated as relatively lower priority by the developers in the traditional dev+test model. When they were under the time pressure, they naturally get the feature implemented first and it took long time for some testability requirements getting honored. The worse was that the developers didn't have the sense of testability in their mind when they design and write code. Quite some testability issues were found in pretty late stage when it's too costly/risky to change the design and code.

Now writing test code is a part of the software engineer's job. They have much strong incentive to improve testability because it will make their own work easier. Plus, they truly learn the lessons of poor testability designs because it hurts themselves.

No more begging to the developers to add an API for my test automation to poll to replace a hard-coded Sleep(10000).

3. Push tests to the left

I had hard time to convince some developers to write more unit tests. This is a true story: a dev in my team wrote a custom lock. I found that there was little unit test of that lock. I asked the dev. He told me he think the scenario tests^[4] has already covered it pretty well. I didn't know what to say. Yes we had code coverage data for unit test. But the hall of shame can only go this far.

Now developers (software engineers) own all the tests. Now they have all the incentives to push the tests to the left^[5]: put as much as tests in unit test, because it's fast, easy to debug and nearly free of noises. The integration test is obviously a less favorable place to put the test: it's slow, more hassle to debug and more noisy.

4. Hiring and retention

That was really, really, really a challenge all the time. Most college graduates prefer SDE than SDET^[6]. Partly because they had little exposure to what the SDET job is about. Partly because they are concerned with the "test" tag. Valid concern. Among the industry candidates, many of those who came from software testing background usually didn't meet our requirement of coding and problem skills, because in many places outside Microsoft, test engineers were mainly doing what the STE^[7] used to do in Microsoft. We ended up having to put a lot of effort in convincing developers from other companies to join Microsoft as SDET, which wasn't an easy sell.

Now, voila, problem solved. There is no more "test" tag. Everyone is "Software Engineer". No more SDET wants to switch to SDE to get rid of the "test" tag, because there is no more SDET.

5. Planning and resourcing

We used to do our planning based on dev estimate only. It was understandable. It's much messier to juggle if every work item has two prices (dev estimate and test estimate). In planning, we assume that for every work item, the test estimate is proportional to the dev estimate (e.g. 1:2, which came from our total test:dev ratio) and we believe the variances in each individual work item will average out. It worked OK most of the time. But there were several times where such model cause significantly under-funded test resources and caused crunch in late stage in the project.

Now when engineering managers and software engineers provide work estimate, the price tag has already included both dev estimate and test estimate. Nobody would underestimate the test cost because otherwise they would have to pay for it anyway.

To summarize, that's the power of the roles and responsibility model. In the past, I was the cook at our home and my wife usually do the cleanup. She always complained that I made the stove and counter-top very messy. Later we made a change: I do both cooking and cleanup (and she took some other housework from me). Then all of sudden I paid a lot of attention to not make kitchen messy because otherwise it would be myself that spend time to clean it up.

p.s. Of course there is also the downside of this change. That would be another topic. But the net is a big plus.

[1] I know I should have called it "Microsoft Azure" rather "Windows Azure". It's just the old habit. For us who joined Azure in its early years, we still call it Windows Azure.
[2] Before the merge, we had dev team and test team. Take myself as an example. I was the test manager leading the test team, partnering with the dev manager who led the dev team. My test team was about half of the size of the dev team. In the shift to combined engineering model, we simply merged and became one engineering team of about 70+ people.
[3] Strictly speaking, our shift to the combined engineering did not only include merging the dev and test, but also redefined the role of PM, which now lean toward the market, customer and competition more than the internal engineering activities, and enlarged the role of the new "software engineer" role （which started from the sum of original dev+test) by adding more DevOps responsibilities.
[4] We didn't differentiate these terms: scenario test, functional test, e2e test, integration test. Our dev did help write quite some functional/scenario tests when test team was running tight. But by and large, the test team owned everything after unit test.
[5] We usually draw a timeline on the whiteboard, from left to the right: the developer changes code in his local repo -> unit test -> other pre-checkin tests -> checkin -> integration tests -> start production rollout -> rollout completed. So "push tests to the left" means push them into the unit test.
[6] SDE = Software Development Engineer. SDET = Software Development Engineer in Test (aka "tester").
[7] STE = Software Test Engineer. Microsoft had this job title until 2005/2006. STE's main job responsibility was writing test spec, enumerating test cases, execute test cases (mainly manually), exploratory tests, etc.. Many STEs had very good analytical skills, knowledgeable of our product and good soft skills, but relatively weak in coding, debugging, design, etc..

人生没有A/B Testing

Nov 20, 2015 1 Comments

A/B testing是互联网公司常用的一种手段，用来帮助在两种不同的方案中做出选择，比如：这个按钮放左边好还是放右边好，字体用11磅的好还是12磅的好，等等。做A/B testing的时候，产品组会随机抽取一小部分用户，比如总用户的10%，然后把其中的一半（就是5%）放到A组，把另一半放到B组。这些用户在打开网站或App的时候，A组的会看到按钮在左边，B组会看到按钮在右边。然后产品组看一下数据。因为这些用户都是随机抽取的，如果A组买的东西多逗留时间长，那就是A方案好。反之，就是B方案好。

可惜在人生大事上没法做A/B testing。比如，到底是中国好还是美国好？要移民么？

理论上来说这种A/B testing要做也是可以做的。我们可以随机找10万个人，随机分成两组。A组五万个人留国内，B组五万个人移民去美国，十年后调查一下这两组人分别过的怎么样。首先不说安排五万个随机抽取的人移民美国的难度。就算这些能搞定，十年后做调查的时候，怎么来衡量“过的怎么样”呢？事业、家庭、金钱、健康、幸福感，这些都要考虑，但各给多少权重呢？

就算最后结果出来了，但那时候的中国已经不是十年前的中国了，美国也不是十年前的美国了，过去十年的试验结果对未来十年已经没多少可参考性了。

既然A/B testing指望不上，那就只能靠不那么具有客观性、科学性的材料做参考了：听听别人怎么说的，看看别人写的心得。每一个单个的人所说的和所写的都或多或少是盲人摸象。所以兼听则明是需要的。但外面的那些文章，大量的是以讹传讹。而很多写亲身经历的，难免掉入“距离产生美”的陷阱。夹叙夹议的，往往会变成“小马过河”：小牛说河水很浅，小羊说河水很深。其实河水就是这点深，到底是太深还是太浅取决于过河的那个人的自身。小马只有自己去河里过一过，才知道河水对它来说是太深还是太浅。毛主席说，梨子的味道要尝过才知道。可是浅尝是不够的。靠出差、旅游和短住是不足以真正了解这只梨子的滋味的。

说到这里，我也不知道我想要说的是什么。

If You Pay Later, You Pay More

Nov 19, 2015 0 Comments

One of my previous managers used to tell us "You either pay now or pay later. If you pay later, you pay more". Years have passed and I have seen how true it is for an engineering team.

The dilemma is: the one who chooses not to pay now may not be the same one who pays later. Why would I pay now, so that someone else wouldn't pay more later? It's natural thing that we make selfish choices, unless there is something to counter balance it.

Here is an example, a real live site incident that happened recently. Our customer couldn't start the virtual machine from the management portal. The cause was in the following code, it threw NullReferenceException because roleInstance.Current was null:



foreach (RoleInstance roleInstance in this.RoleInstances)

{

    int currentUpdateDomain = (int)roleInstance.Current

                                               .Container

                                               .ServiceUpdateDomain;

    //...

}

When the developer pressed "." after roleInstance.Current, he probably didn't pause and ask himself: would the Current always be not null? He probably didn't spend time to read the related code a bit to find out and put extra code there for safety (e.g. "if(roleInstance.Current!=null)"). If he did all these (the pause, the code reading and the additional code), he would be slower. But that would have saved so much more associated with the live site incident: people's time spent on investigate the incident, the time to rollout the hotfix, and the time to handle the (unhappy) customer. But those time is not the developer's time. By cutting some corners, he probably got a few more work items done. Thus, he probably got a somewhat better performance review and promoted a bit sooner. Then he moved on and leave the team behind to "pay later but pay more".

Our performance review model doesn't help, either. In the annual review cycle, we barely can hold people accountable for something they did more than a year ago. Once bonus are paid and promotions are done, unless it's something really bad (like causing the subprime crisis), we are not going to take the bonus back or revert the promotion.

Among the things that we can do, one thing I did was to keep my team members' ownership unchanged^[1] for long time (e.g. two years, if not more) and told them so upfront. The benefits are:

By fixing people on the same thing for longer time, the one who chooses not to pay now would be more likely the same person who will pay later (and pay more).
By telling them so upfront, it does not only counter-balances the shortsighted cutting-corners, but also encourages the right behavior and investments in their areas that will lead to long-term successes. It's like if I know I am going to live in this house for at least five years, I will spend the first year cleaning up the weeds and fixing the irrigation system in the backyard, then plant the plum trees in the second year and keep fertilizing and take good care of it in the third and fourth year, so that from the fifth year onward, I get to eat the sweat plums while enjoying the sun and breeze in my backyard.

That's why re-org has a downside. In some companies where a re-org happens every 18-24 months, although the organizations get to more frequently optimize their structure and alignment, it also sets a norm that discourage long-term investments and successes: why bother planting the plum trees if I know I am going to move to another house every 18-24 months?

As Reid Hoffman said: "Good managers know that it’s difficult to achieve long-term success without obtaining long-term commitments from employees."

[1] I usually did it in a mixed way: some fixed ownership + some flexibility of changing projects once a while.

They Are Not Tech Companies

Nov 8, 2015 0 Comments

I was listening to a podcast lately and they were talking about a tech startup, Wevorce, which disrupts the divorce market:

A system that works by attracting couples to the service, collecting data on them through an initial survey, and using their results to classify each person as a particular divorce "archetype."

Then, the Wevorce team of counselors, family planners, and lawyers steps in. They use their research, data, and training to mediate at predictable moments of tension — a processing system kind of like TurboTax or H&R Block.

How is that a tech company? What is the tech here? Is filling an online survey considered "using technology"? To me, that is a law company. A law startup. Not a tech start up. I fill a survey form when I visit a physical therapist for the first time. If that form is done online and they have an algorithm to analyze my profile to recommend the best therapist and treatment plan, is the hospital considered a tech company? Of course not.

To me, tech companies are those who advance the technologies and make innovations in technology. If a company makes innovation in another trade, using the help from the latest technologies, it's not a tech company. For example, Blue Apron is not a tech company. They are a meal kit company. It is still a great startup, a great business innovation. I am a customer and I like it.

For the same reason, Instacart, of which I am a customer too, is not a tech company either. They do provide a new experience of buying groceries. But at the end of the day, they are a grocery store. An online grocery store. Putting a storefront online and providing an app for customers to place order doesn't make it a tech company. ToysRUs sells toys online, but no one calls ToysRUs a tech company.

They are not tech companies also because the technology is not the key ingredient to found those companies and make them successful businesses. A tech person (like me) don't have know-how in those business sectors. Instacart? Maybe OK. But definitely not Wevorce or Blue Apron. Wevorce was founded by a family lawyer and Blue Apron was started by a chef and a VC.

In these cases, technology (mobile, data, etc.) is more like the enabler and catalyst. Technology can give these companies an edge over the disruptees in the trade. But if they don't get the core of their trade right, technology won't matter. If the spinach in Blue Apron's big box had already wilted when it arrives my door steps, if they recipes tasted no much difference than average family meals, they would not have been successful.

Don't take me wrong. Instacart and Blue Apron are still awesome business innovations. Just don't call them tech companies any more.