Software developer, racing fan
689 stories
·
26 followers

BLeak: automatically debugging memory leaks in web applications

1 Share

BLeak: Automatically debugging memory leaks in web applications Vilk & Berger, PLDI’18

BLeak is a Browser Leak debugger that finds memory leaks in web applications. You can use BLeak to test your own applications by following the instructions at http://bleak-detector.org.

Guided by BLeak, we identify and fix over 50 memory leaks in popular libraries and apps including Airbnb, AngularJS, Google Analytics, Google Maps SDK, and jQuery. BLeak’s median precision is 100%; fixing the leaks it identifies reduces heap growth by an average of 94%, saving from 0.5MB to 8MB per round trip.

Why are web application memory leaks so problematic?

Memory leaks in web applications are a pervasive problem. They lead to higher garbage collection frequency and overhead, reduced application responsiveness, and even browser tab crashes. Existing memory leak detection approaches don’t work well in the browser environment though:

  • Staleness detection assumes leaked memory is rarely touched, but web apps regularly interact with leaked state (e.g. via event listeners).
  • Growth-based technique assume that leaked objects are uniquely owned, or that leaked objects from strongly connected components in the heap graph. In a web application, objects frequently have multiple owners, and all roads lead back to window.
  • Techniques that depend on static type information are not applicable since JavaScript is dynamically typed.

The current state of the art is manual processing of heap snapshots. As we show, this approach does not effectively identify leaking objects or provide useful diagnostic information, and it thus does little to help developers locate and fix memory leaks.

Consider the following extract from the Firefox debugger, which leaks memory every time a developer opens a source file since the event listeners are never removed. (The leak was found using BLeak of course!).

Using heap snapshots to try and detect diagnose the problem leaves developers staring at heap snapshots like this:

The top item in the heap snapshot, Array conflates all arrays in the application under one heading, and the (array) entry is referring to internal V8 arrays not under the application’s direct control. The primary leak is actually the of the Preview object, but it appears low on this list and has a small retained size, making it an unlikely target for investigation using this view. And even if a developer does decide to look into the Preview entry, finding the responsible code based on the retaining path information in the snapshot is not easy:

An overview of BLeak

Instead of poring over heap snapshots, for the same memory leak as above BLeak will produce diagnostic output that looks like this:

The developer is pointed directly to the problematic source code.

BLeak automatically detects, ranks, and diagnoses memory leaks. The central observation is that in modern web apps users frequently return to the same visual state. For example, the news feed in Facebook, the property listing page in Airbnb, and the inbox view in Gmail.

… these round trips can be viewed as an oracle to identify leaks. Because visits to the same (approximate) visual stat should consume roughly the same amount of memory, sustained memory growth between visits is a strong indicator of a memory leak.

To use BLeak, you need to provide a short driver loop that cycles the application through the desired visual states. The loop is specified as an array of objects, each providing a check function to validate state preconditions, and a next function to transition to the next state. The final transition must take you back to the initial state, so that the cycle can repeat.

An example makes this much easier to understand. In the Firefox debugger case, the debugger opens to a view with a list of all documents in the application being debugged. Clicking on the first document in the list (main.js) opens it in the text editor. Closing the editor takes us back to the list view again. Here’s the loop specification:

Appendix B contains the driver loops used for a variety of applications including Airbnb. You can see that the loops are fairly easy to write.

BLeak uses the provided loop script to drive the application in a loop (eight iterations by default).

After each visit to the first visual state in the loop, BLeak takes a heap snapshot and tracks specific paths from GC roots that are continually growing. BLeak treats a path as growing if the object identified by that path gains more outgoing references (e.g., when an array expands or when properties are added to an object).

In the Firefox debugger example, BLeak finds four leak roots: an array within the codeMirror object that contains scroll event listeners, and event listener lists for mouseover, mouseup, and mousedown events on the DOM element containing the text editor.

The detected leak roots are then ranked to help developers prioritise their attention. Ranking is done via a leak share metric which first prunes objects in the graph reachable by non-leak roots, and then splits the credit for the remaining objects equally among the leak roots that retain them.

The final piece of the puzzle is pinpointing the sources of the detected leaks in order to produce the report we saw earlier.

BLeak … reloads the application and uses its proxy to transparently rewrite all of the JavaScript on the page, exposing otherwise-hidden edges in the heap as object properties. BLeak uses JavaScript reflection to instrument identified leak roots to capture stack traces when they grow and when they are overwritten (not just where they were allocated). With this instrumentation in place, BLeak uses the developer-provided script to run one final iteration of the loop to collect stack tracks. These stack traces directly zero in on the code responsible for leak growth.

BLeak in action

BLeak is evaluated using a corpus of give popular web application that in turn use a variety of libraries (React, AngularJS, Google Analytics, etc.):

  • Airbnb (looping between the page listing all services on Airbnb, and the page listing only homes and rooms for rent)
  • Piwik 3.0.2 – an open source analytics platform. BLeak repeatedly visits the main dashboard page.
  • Loomio 1.8.66 – a collaborative platform for group decision making. BLeak transitions between a group page listing all threads in a group, and the first thread listed on the page.
  • Mailpile v1.0.0 – an open source mail client. BLeak cycles between the inbox and the first four emails in the inbox.
  • Firefox debugger – while debugging Mozilla’s SensorWeb. The BLeak loop repeatedly opens and closes main.js in the debugger’s text editor.

Driver loops for all of these programs can be found in Appendix B, and the full list of detected leaks in these applications can be found in Appendix A.

Overall, BLeak finds 59 distinct memory leaks across the five applications, all of which were unknown to application developers. 27 of the discovered leaks were actually in libraries used by the web applications (6 of the 27 had been independently diagnosed by the library developers and had pending fixes). That leaves 32 memory leaks detected directly in the application code.

We reported all 32 new memory leaks to the relevant developers along with our fixes; 16 are now fixed, and 4 have fixes in code review. We find new leaks in popular applications and libraries including Airbnb, AngularJS, Google Maps SDK, Google Tag Manager, and Google Analytics.

The key results are summarised in the table below.

  • BLeak has an average precision of 96.8%, and a median precision of 100%. Overall there were only three false positives, all caused by an object that continuously grows until some threshold or timeout is reached – increasing the number of round trips would have removed these .
  • BLeak accurately pinpoints the code responsible for the leaks in all but one case (in which it fails to record a stack trace).
  • Guide by the BLeak reports, the authors were able to fix every memory leak. It took about 15 minutes per leak to implement a fix.
  • BLeak locates, ranks, and diagnoses memory leaks in less than seven minutes on these applications. Most of the time goes on receiving and parsing Chrome’s JSON-based heap snapshots.
  • On average, fixing the memory leaks that BLeak reports eliminates over 93%h of all heap growth.
  • Of the memory leaks BLeak finds, at least 77% would not be found with a staleness-based approach.

We show that BLeak has high precision and finds numerous previously-unknown memory leaks in web applications and libraries. BLeak is open source and is available for download at http://bleak-detector.org.



Read the whole story
vitormazzi
4 hours ago
reply
Brasil
Share this story
Delete

Google Tracks its Users Even if They Opt-Out of Tracking

1 Comment and 3 Shares

Google is tracking you, even if you turn off tracking:

Google says that will prevent the company from remembering where you've been. Google's support page on the subject states: "You can turn off Location History at any time. With Location History off, the places you go are no longer stored."

That isn't true. Even with Location History paused, some Google apps automatically store time-stamped location data without asking.

For example, Google stores a snapshot of where you are when you merely open its Maps app. Automatic daily weather updates on Android phones pinpoint roughly where you are. And some searches that have nothing to do with location, like "chocolate chip cookies," or "kids science kits," pinpoint your precise latitude and longitude ­- accurate to the square foot -­ and save it to your Google account.

Google isn't the problem. Google is a symptom of the bigger problem: surveillance capitalism. As long as surveillance is the business model of the Internet, things like this are inevitable.

BoingBoing story.

Read the whole story
vitormazzi
16 hours ago
reply
Brasil
Share this story
Delete
1 public comment
MotherHydra
14 hours ago
reply
"surveillance capitalism." Let that sink in and try not to vomit.
Space City, USA

Voting Software

15 Comments and 43 Shares
There are lots of very smart people doing fascinating work on cryptographic voting protocols. We should be funding and encouraging them, and doing all our elections with paper ballots until everyone currently working in that field has retired.
Read the whole story
popular
4 days ago
reply
vitormazzi
5 days ago
reply
Brasil
Share this story
Delete
13 public comments
caffeinatedhominid
7 hours ago
reply
Yep.
tante
3 days ago
reply
xkcd on voting software is spot-on
Oldenburg/Germany
wmorrell
4 days ago
reply
Hazmat suit, too. Just to be safe.
rjstegbauer
5 days ago
reply
Amen!! Paper... paper... paper. It's simple. It's trivial to recount. Everyone already knows how to use it. It's cheap. It's verifiable. Just... use... paper.
ianso
5 days ago
reply
Yes!
Brussels
ChrisDL
5 days ago
reply
accurate.
New York
reconbot
6 days ago
reply
Legitimately share this comic with anyone who represents you in government.
New York City
cheerfulscreech
6 days ago
reply
Truth.
jth
6 days ago
reply
XKCD Nails Secure Electronic Voting.
Saint Paul, MN, USA
skorgu
6 days ago
reply
100% accurate.
jsled
6 days ago
reply
endorsed; co-signed; it. me. &c.

(alt text: «There are lots of very smart people doing fascinating work on cryptographic voting protocols. We should be funding and encouraging them, and doing all our elections with paper ballots until everyone currently working in that field has retired.»)
South Burlington, Vermont
alt_text_bot
6 days ago
reply
There are lots of very smart people doing fascinating work on cryptographic voting protocols. We should be funding and encouraging them, and doing all our elections with paper ballots until everyone currently working in that field has retired.
alt_text_at_your_service
6 days ago
reply
There are lots of very smart people doing fascinating work on cryptographic voting protocols. We should be funding and encouraging them, and doing all our elections with paper ballots until everyone currently working in that field has retired.
srsly
6 days ago
Seconding this policy ^^

Saturday Morning Breakfast Cereal - Cattle

1 Comment and 13 Shares


Click here to go see the bonus panel!

Hovertext:
A superior option to vegetarianism is to breed an organism so dickish that the ethical conundrumm disappears entirely.


Today's News:

Stay tuned, civics dorks!

Read the whole story
popular
8 days ago
reply
vitormazzi
9 days ago
reply
Brasil
Share this story
Delete
1 public comment
jimwise
9 days ago
reply
Lol

Phoenix Tips and Tricks

1 Share

As newcomers get up and running quickly with Phoenix, we see folks hit a few common issues that they can cleanly solve with a few simple tips.

Override action/2 in your controllers

Often times, you’ll find yourself repeatedly needing to access connection information in your controller actions, such as conn.assigns.current_user or similarly reaching deeply into nested connection information. This can become tedious and obscures the code. While we could extract the lookup to a function, such as current_user(conn), then we are needlessly performing extra map access when we only need to do the lookup a single time. There’s a better way.

Phoenix controllers all contain an action/2 plug, which is called last in the controller pipeline. This plug is responsible for calling the function specified in the route, but Phoenix makes it overridable so you can customize your controller actions. For example, imagine the following controller:

defmodule MyApp.PostController do
  use MyApp.Web, :controller

  def show(conn, %{"id" => id}) do
    {:ok, post} = Blog.get_post_for_user(conn.assigns.current_user, id)
    render(conn, "show.html", owner: conn.assigns.current_user, post: post)
  end

  def create(conn, %{"post" => post_params}) do
    {:ok, post} = Blog.publish_post(conn.assigns.current_user, post_params)
    redirect(conn, to: user_post_path(conn, conn.assigns.current_user, post)
  end
end

Not terrible, but the repeated conn.assigns.current_user access gets tiresome and obscures what we care about, namely the current_user. Let’s override action/2 to see how we can clean this up:

defmodule MyApp.PostController do
  use MyApp.Web, :controller

  def action(conn, _) do
    args = [conn, conn.params, conn.assigns[:current_user] || :guest]
    apply(__MODULE__, action_name(conn), args)
  end

  def show(conn, %{"id" => id}, current_user) do
    {:ok, post} = Blog.get_post_for_user(current_user, id)
    render(conn, "show.html", owner: current_user, post: post)
  end

  def create(conn, %{"post" => post_params}, current_user) do
    {:ok, post} = Blog.publish_post(current_user, post_params)
    redirect(conn, to: user_post_path(conn, current_user, post)
  end
end

Much nicer. We simply overrode action/2 on the controller, and modified the arities of our controller actions to include a new third argument, the current_user, or :guest if we aren’t enforcing authentication. If we want to apply this to multiple controllers, we can extract it to a MyApp.Controller module:

defmodule MyApp.Controller do
  defmacro __using__(_) do
    quote do
      def action(conn, _), do: MyApp.Controller.__action__(__MODULE__, conn)
      defoverridable action: 2
    end
  end

  def __action__(controller, conn) do
    args = [conn, conn.params, conn.assigns[:current_user] || :guest]
    apply(controller, Phoenix.Controller.action_name(conn), args)
  end
end

Now any controller that wants to use our modified actions can use MyApp.Controller on a case-by-case basis. We also made sure to make action/2 overridable again to allow caller’s downstream to customize their own behavior.

Rendering the ErrorView directly

Most folks use their ErrorView to handle rendering exceptions after they are caught and translated to the proper status code, such as a Ecto.NoResultsError rendering the “404.html” template or a Phoenix.ActionClauseError rendering the “400.html” template. What many miss is the fact that the ErrorView is just like any other view. It can and should be called directly to render responses for your error cases rather than relying on exceptions for all error possibilities. For example, imagine handling the error cases for our PostController in the previous example:

def create(conn, %{"post" => post_params}, current_user) do
  with {:ok, post} <- Blog.publish_post(current_user, post_params) do
    redirect(conn, to: user_post_path(conn, current_user, post)
  else
    {:error, %Ecto.Changeset{} = changeset} -> render(conn, "edit.html", changeset: changeset)
    {:error, :unauthorized} ->
      conn
      |> put_status(401)
      |> render(ErrorView, :"401", message: "You are not authorized to publish posts")
    {:error, :rate_limited} ->
      conn
      |> put_status(429)
      |> render(ErrorView, :"429", message: "You have exceeded the max allowed posts for today")
  end
end

Here we’ve used the Elixir 1.3 with/else expressions. Note how we are able to succinctly send the 401 and 429 responses by directly rendering our ErrorView. We also passed the template name as an atom, such as :"401" so our template will be rendered based on the accept headers such as "401.json" or "404.html".

Avoid Task.async if you don’t plan to Task.await

Elixir Tasks are great for cheap concurrency and parallelizing bits of work, but we often see Task.async used incorrectly. The most important thing to realize is that the caller is linked to the task. This means that if the task crashes, the caller does as well, and vice-versa. For example, the following code is perfectly fine because we await both tasks and we expect to crash if they fail:

def create(conn, %{"access_code" => code}) do
  facebook = Task.async(fn -> Facebook.get_token(code) end)
  twitter  = Task.async(fn -> Twitter.get_token(code) end)

  render(conn, "create.json", facebook: Task.await(facebook),
                              twitter: Task.await(twitter))
end

In this case, we want to fetch a token from Facebook and Twitter, and we can do this work in parallel since the tasks are not coupled in any way. When rendering our JSON response for the client, we can await both tasks and send the response back. This use of Task.async and Task.await is just fine, but now imagine another case where we want to fire off a quick task and immediately respond to the client.

def delete(conn, _, current_user) do
  {:ok, user} = Accounts.cancel_account(current_user)
  Task.async(fn -> Audits.alert_cancellation_notice(user) end)

  conn
  |> signout()
  |> put_flash(:info, "So sorry to see you go!")
  |> redirect(to: "/")
end

In this case, we want to notify our staff about an account cancellation, say by sending an email, but we don’t want the client to wait on this particular work. It might feel natural to use Task.async here, but since we aren’t awaiting the result and the client isn’t concerned about its success, we have an issue. First, we are linked to the caller, so any abnormal exit on either side will crash the other. The client could get a 500 error after their account has been canceled and not be sure if their operation was successful. Likewise, our staff notice could be brought down by an error when sending the response, preventing our staff being alerted of the completed event. We can use Task.Supervisor and its async_nolink to achieve an offloaded process that is isolated under its own supervision tree.

First, we’d need to add our own Task.Supervisor, to our supervision tree, in lib/my_app.ex:

children = [
  ...,
  supervisor(Task.Supervisor, [[name: MyApp.TaskSupervisor]])
]

Next, we can now offload the task to our supervisor. We’ll also use the async_nolink function to isolate the task from the caller:

def delete(conn, _, current_user) do
  {:ok, user} = Accounts.cancel_account(current_user)
  Task.Supervisor.async_nolink(MyApp.TaskSupervisor, fn ->
    Audits.alert_cancellation_notice(user) end)
  end)

  conn
  |> signout()
  |> put_flash(:info, "So sorry to see you go!")
  |> redirect(to: "/")
end

Now our task is properly offloaded to its own supervisor who will take care of any failures and proper logging. Likewise, any crash in the task, or the controller, won’t affect the other.

With these tips, you’ll keep your code clean and to the point, and isolated when required.

Read the whole story
vitormazzi
12 days ago
reply
Brasil
Share this story
Delete

Nicholas Weaver on Cryptocurrencies

3 Shares

This is well-worth reading (non-paywalled version). Here's the opening:

Cryptocurrencies, although a seemingly interesting idea, are simply not fit for purpose. They do not work as currencies, they are grossly inefficient, and they are not meaningfully distributed in terms of trust. Risks involving cryptocurrencies occur in four major areas: technical risks to participants, economic risks to participants, systemic risks to the cryptocurrency ecosystem, and societal risks.

I haven't written much about cryptocurrencies, but I share Weaver's skepticism.

Read the whole story
vitormazzi
21 days ago
reply
Brasil
Share this story
Delete
Next Page of Stories