Software developer, racing fan
715 stories
·
27 followers

Saturday Morning Breakfast Cereal - Median

4 Shares


Click here to go see the bonus panel!

Hovertext:
What's that gray blobby thing at the bottom, you ask? That's a folded pile of flattened dead plants that once conveyed news from far-away lands.


Today's News:
Read the whole story
vitormazzi
18 days ago
reply
Brasil
Share this story
Delete

Orca: differential bug localization in large-scale services

2 Shares

Orca: differential bug localization in large-scale services Bhagwan et al., OSDI’18

Earlier this week we looked at REPT, the reverse debugging tool deployed live in the Windows Error Reporting service. Today it’s the turn of Orca, a bug localisation service that Microsoft have in production usage for six of their large online services. The focus of this paper is on the use of Orca with ‘Orion,’ where Orion is a codename given to a ‘large enterprise email and collaboration service that supports several millions of users, run across hundreds of thousands of machines, and serves millions of requests per second.’ We could it ‘Office 365’ perhaps? Like REPT, Orca won a best paper award (meaning MR scooped 2 out of the three awards at OSDI this year!).

Orca is designed to support on-call engineers (OCEs) in quickly figuring out the change (commit) that introduced a bug to a service so that it can be backed out. (Fixes can come later!). That’s a much harder task than it sounds in highly dynamic and fast moving environments. In ‘Orion’ for example there are many developers concurrently committing code. Post review the changes are eligible for inclusion in a build. An administrator periodically creates new builds combining multiple commits. A build is the unit of deployment for the service, and may contain from one to hundreds of commits.

There’s a staged roll-out process where builds move through rings. A ring is just a pre-determined set of machines that all run the same build. Builds are first deployed onto the smallest ring, ring 0, and monitored. When it is considered safe the build will progress to the next ring, and so on until it is finally deployed world wide.

Roughly half of all Orion’s alerts are caused by bugs introduced through commits. An OCE trying to trace an alert back to a commit may need to reason through hundreds of commits across a hierarchy of builds. Orca reduced the OCE workload by 3x when going through this process. No wonder then that it seems to be spreading rapidly within Microsoft.

Example post-deployment bugs

Over a period of eight months, we analyzed various post-deployment bugs and the buggy source-code that caused them. Table 2 (below) outlines a few characteristic issues.

The commits column in the table above shows the number of candidate commits that an OCE had to consider while triaging the issue. Post-deployment bugs fall pre-dominantly into the following categories:

  • Bugs specific to certain environments (aka ‘works for me!’)
  • Bugs due to uncaptured dependencies (e.g. a server-side implementation is modified but the corresponding client change has not been made)
  • Bugs that introduce performance overheads (that only emerge once a large number of users are active)
  • Bugs in the user-interface whereby a UI feature starts misbehaving and customers are complaining.

Studying the bugs and observing the OCEs gave a number of insights that informed Orca’s design:

  • Often the same meaningful terms occur both in the symptom and the cause. E.g. in bug no 1 in the above table the symptom is that the People Suggestion feature has stopped working, and the commit introducing the problem has a variable ‘suggest’.
  • Testing and anomaly detection algorithms don’t always find a bug immediately. A bug can start surfacing in a new build, despite being first introduced in a much older build. For example bug 3 in the table appeared in a build that contained 160 commits, but the root cause was in the previous build with 41 commits.
  • Builds may contain hundreds of commits, so manually attributing bugs can be a long task.
  • There are thousands of probes in the system, and probe failures and detections are continuously logged.

Based on these insights Orca was designed as a custom search engine using the symptom as the query text. A build provenance graph is maintained so that Orca knows the candidate set of commits to work through, and a custom ranking function is used to help rank commits in the search results based on a prediction of risk in the commit. Finally, the information already gathered from the logged probe failures give a rich indication of likely symptom search terms, and these can be used to track the frequency of terms in query and use Inverse Query Frequency ICF (cf. Inverse Document Frequency, IDF) in search ranking.

In addition to being used by OCEs to search symptoms, multiple groups have also integrated Orca with their alerting system to get a list of suspect commits for an alert and include it in the alert itself.

How Orca works

The input query to Orca is a symptom of the bug… The idea is to search for this query through changes in code or configurations that could have caused this bug. Thus the “documents” that the tool searches are properties that changed with a commit, such as the names of files that were added, removed or modified, commit and review comments, modified code and modified configuration parameters. Orca’s output is a ranked list of commits, with the most likely buggy commit displayed first.

The first step is to tokenize the terms from the presenting symptoms, using heuristics specially built for code and log messages (e.g. splitting large strings according to Camel-casing). In addition to single tokens, some n-gram tokens are also created. After tokenization, stop words are filtered out, as well as commonly used or irrelevant terms found through analysing about 8 million alerts in Orion’s log store (e.g., the ‘Exception’ postfix on class names).

Now we need to find the set of commits to include in the search. Orca creates and maintains a build provenance graph. The graph captures dependencies between builds and across rings.

Builds that are considered stable in a given ring can be promoted to the next ring, which gives rise to an inter-ring edge. Interestingly not all fixes roll forward through this process though. A critical bug found in a build in a later ring can be fixed directly in that ring and then the fix is back-ported to earlier rings. Orca uses the graph to trace a path back from the the build in which the symptom is exhibiting, to the origin build in ring 0. The candidate list of commits to search includes all commits in the builds on this path, together with any back-ported commits.

Given the set of commits, Orca uses differential code analysis to prune the search space. ASTs of the old and new versions of the code are compared to discover relevant parts of the source that have been added, removed and modified in a commit. The analysis finds differences in classes, methods, references, conditions, and loops. For all of the changed entities, a heuristic determines what information to include in the delta. For example, if two lines in a function have been changed the diff will include the entire text of the two changed lines (old version and new version) as well as the name of the function. This approach catches higher-level structures that a straight lexical analysis of the diff would miss.

The output of the differential code analysis is search for tokens from the tokenised symptom description, using TF-IQF as a ‘relevance’ score. These scores are first computed on a per-file basis and then aggregated to give an overall score for the commit. In the evaluation, ‘max’ was found to work well as the evaluation function.

We can now return the ranked search results. However, the authors found that very often multiple commits had the same or very similar scores. To break ties between commits with the same scores, a commit risk prediction model is used.

Commit risk prediction

We have built a regression tree-based model that, given a commit, outputs a risk value for it which falls between 0 and 1. this is based on data we have collected for around 93,000 commits made over 2 years. Commits that caused bugs in deployed are labeled ‘risky’ while those that did not, we labeled ‘safe.’ We have put in considerable effort into engineering the features for this task…

This is a super-interesting area in its own right. The paper only includes a brief summary of the main features that inform this model:

  • Developers who are new to the organisation and the code base tend to create more post-deployment bugs. So there are several experience-related features in the model.
  • Files mostly changed by a single developer tend to have fewer bugs than files touched by several developers. So the model includes features capturing whether a commit includes files with many owners or few.
  • Certain code-paths when touched tend to cause more post-deployment bugs than others, so the model includes features capturing this.
  • Features such as file types changed, number of lines changed, and number of reviewer comments capture the complexity of the commit.

Evaluation

Since its deployment for ‘Orion’ in October 2017, Orca has been used to triage 4,400 issues. The authors collected detailed information for 48 of these to help assess the contribution that Orca makes to the process. Recall that an OCE is presented with a symptom and may have to search through hundreds of commits to find the culprit. The following table shows how often Orca highlights the correct commit in a top-n list of results (look at the top line, and the ‘ALL’ columns):

As deployed, Orca presents 10 search results. So the correct commit is automatically highlighted to the OCE as part of this list in 77% of cases. The build provenance graph contributes 8% to the recall accuracy, and the commit risk-based ranking contributes 11%.

The impact on overall OCE workload is significant. The following chart shows the number of expected commits an OCE investigates with and without Orca for the 48 issues in the study.

… using Orca causes a 6.5x reduction in median OCE workload and a 3x (67%) reduction in the OCE’s average workload… For the 4 bugs that were caught only because of the build provenance graph, the OCE had to investigate an average of 59.4 commits without Orca, and only 1.25 commits with it. This is a 47.5x improvement.

A closing thought:

Though we describe Orca in the context of a large service and post-deployment bugs, we believe the techniques we have used also apply generically to many Continuous Integration / Continuous Deployment pipelines. This is based on our experience with multiple services that Orca is operational on within our organization.



Read the whole story
vitormazzi
31 days ago
reply
Brasil
Share this story
Delete

How the Mercator Projection Distorts the True Sizes of Countries on Maps

1 Comment and 13 Shares

Data scientist Neil Kaye made this map to show how much the popular Mercator projection distorts the sizes of many countries, particularly those in the Northern Hemisphere.

Mercator Adjusted

The distortion in the animated version is even clearer. Key takeaway: Africa is *enormous*.

See also the true size of things on world maps.

True Size Map

Tags: maps   Neil Kaye
Read the whole story
vitormazzi
32 days ago
reply
Brasil
popular
33 days ago
reply
Share this story
Delete
1 public comment
MotherHydra
34 days ago
reply
Low information students forgot that map distortion was taught in school.
Space City, USA

Google and Uber

jwz
3 Comments and 16 Shares
How are these murderous sociopaths not in jail?

"If it is your job to advance technology, safety cannot be your No. 1 concern," Levandowski told me. "If it is, you'll never do anything. It's always safer to leave the car in the driveway. You'll never learn from a real mistake."

Levandowski had modified the cars' software so that he could take them on otherwise forbidden routes. A Google executive recalls witnessing Taylor and Levandowski shouting at each other. Levandowski told Taylor that the only way to show him why his approach was necessary was to take a ride together. The men, both still furious, jumped into a self-driving Prius and headed off.

The car went onto a freeway, where it travelled past an on-ramp. According to people with knowledge of events that day, the Prius accidentally boxed in another vehicle, a Camry. A human driver could easily have handled the situation by slowing down and letting the Camry merge into traffic, but Google's software wasn't prepared for this scenario. The cars continued speeding down the freeway side by side. The Camry's driver jerked his car onto the right shoulder. Then, apparently trying to avoid a guardrail, he veered to the left; the Camry pinwheeled across the freeway and into the median. Levandowski, who was acting as the safety driver, swerved hard to avoid colliding with the Camry, causing Taylor to injure his spine so severely that he eventually required multiple surgeries.

The Prius regained control and turned a corner on the freeway, leaving the Camry behind. Levandowski and Taylor didn't know how badly damaged the Camry was. They didn't go back to check on the other driver or to see if anyone else had been hurt. Neither they nor other Google executives made inquiries with the authorities. The police were not informed that a self-driving algorithm had contributed to the accident.

Levandowski, rather than being cowed by the incident, later defended it as an invaluable source of data, an opportunity to learn how to avoid similar mistakes. He sent colleagues an e-mail with video of the near-collision. Its subject line was "Prius vs. Camry." (Google refused to show me a copy of the video or to divulge the exact date and location of the incident.) He remained in his leadership role and continued taking cars on non-official routes.

According to former Google executives, in Project Chauffeur's early years there were more than a dozen accidents, at least three of which were serious. One of Google's first test cars, nicknamed kitt, was rear-ended by a pickup truck after it braked suddenly, because it couldn't distinguish between a yellow and a red traffic light. Two of the Google employees who were in the car later sought medical treatment. A former Google executive told me that the driver of the pickup, whose family was in the truck, was unlicensed, and asked the company not to contact insurers. kitt's rear was crushed badly enough that it was permanently taken off the road.

In response to questions about these incidents, Google's self-driving unit disputed that its cars are unsafe. "Safety is our highest priority as we test and develop our technology," a spokesperson wrote to me. [...]

As for the Camry incident, the spokesperson [said that] because Google's self-driving car did not directly hit the Camry, Google did not cause the accident.

These words actually came out of this creature's mouth, on purpose, when it knew that humans could hear it speaking:

"The only thing that matters is the future," [Levandowski] told me after the civil trial was settled. "I don't even know why we study history. It's entertaining, I guess -- the dinosaurs and the Neanderthals and the Industrial Revolution, and stuff like that. But what already happened doesn't really matter. You don't need to know that history to build on what they made. In technology, all that matters is tomorrow."

Previously, previously, previously, previously, previously.

Read the whole story
vitormazzi
32 days ago
reply
Brasil
popular
33 days ago
reply
Share this story
Delete
3 public comments
awilchak
13 days ago
reply
Nope
Brooklyn, New York
mkalus
33 days ago
reply
W T F.
iPhone: 49.287476,-123.142136
jimwise
34 days ago
reply
😳

Começando com um mea culpa, infelizmente eu sou responsável por boa parte dessa ...

2 Shares
Começando com um mea culpa, infelizmente eu sou responsável por boa parte dessa merda toda. Na década de 90 eu tinha website e fanzine digital, ajudei a criar a cultura brasileira na internet. No começo do século eu trabalhava criando dispositivos de inclusão digital, e depois eu fui desenvolvedor do Orkut, a rede social que ensinou os brasileiros a usar a internet.
Eu fiz tudo isso na inocência, achava de verdade que a internet iria trazer mais informação a todos, e que isso melhoraria o mundo. Hoje vejo que fui ingênuo, subestimei o papel da falsa informação e de como a internet permitiu a manipulação de massa de um jeito muito mais eficiente.
Mas se me permitem, eu acho que sei onde a coisa deu errado e tenho uma sugestão para melhorar. Eu acho que o foco do problema é o *broadcast privado*.
Broadcast nunca fui problema na internet. As pessoas mandavam mensagens em lista de discussão, postavam em comunidade do orkut, e nesses meios o fake news nunca teve esse poder que tem hoje.
O que mudou é que hoje em dia você pode fazer broadcast para grupos fechados (tipo o zap da família). Isso foi o game changer. Quando você postava na lista de discussão ou na comunidade do orkut, você estava aberto à contestação, se você posta X, alguém sempre pode postar não-X logo abaixo. Em termos de evidência acumulada, as coisas se cancelam.
Mas o zap da família é diferente porque é uma rede de contatos confiáveis. Você confia nos seus parentes. Quando chega uma mensagem nesse grupo, você tende a confiar por default na mensagem porque você confia em quem mandou, e essa confiança no emissor transfere para a mensagem.
Aí você forma uma rede de confiança. Não importa que o emissor original seja um bandido na cadeia, escrevendo de um celular que entrou na cela enfiado no cu. Se a mensagem dele passar de grupo em grupo, ela vai ganhando confiança ao longo do trajeto, porque cada broadcast privado aumenta a confiança dela. A confiança vem do caminho, não do conteúdo.
A solução fácil para isso seria banir broadcast privado. Não pode mais ter grupo de zap, ou pelo menos não pode ter botão de share para grupo privado: se você quiser repassar, vai ter que reescrever. Aumentando a fricção no processo de share, a mensagem falsa tem mais dificuldade de propagar.
Outra solução seria fazer tracking da confiança da mensagem. Eu não faço idéia de como implementar isso, mas cada mensagem teria que ter um score que medisse a confiança da origem. Uma mensagem que transitou inalterada de uma fonte primária teria score alto, uma mensagem enviada por celular tirado do cu teria score baixo.
Mas eu não sei se nenhuma dessas alternativas é economicamente viável, dado que os times que implementam esses apps medem sucesso baseado em quantidade de shares. Se alguém tiver uma idéia melhor para atacar o problema eu agradeço.


Read the whole story
vitormazzi
41 days ago
reply
Brasil
luizirber
41 days ago
reply
Davis, CA
Share this story
Delete

Saturday Morning Breakfast Cereal - SOB

2 Comments and 14 Shares


Click here to go see the bonus panel!

Hovertext:
Learning etymology is like going through the attic of a recently dead uncle, who seemed mostly domesticated, but who spent his life having x-rated adventures.


Today's News:
Read the whole story
vitormazzi
46 days ago
reply
Brasil
Share this story
Delete
1 public comment
jlvanderzwan
46 days ago
reply
Well, if we're comparing bestiality vs a scenario in which it is still possible that all parties consented to having sex...
diannemharris
46 days ago
While I see your point, there is nothing to rule out that the father is a dog as well, thus ruling out bestiality.
jlvanderzwan
46 days ago
Implying the human child was adopted?
diannemharris
42 days ago
or that they themselves( the child) are not human either.
Next Page of Stories