We build Big Data systems for organizations with a compelling need for high quality data.

SGM helps companies become data-driven organizations that innovate through data.

We practice the soft science of asking questions of data in a way that yields a nuanced picture of how things really stand, not an oversimplication that obscures the challenges which lie ahead.

Our clients include mature data consumers trying to keep up with terabytes of data coming in from millions of users every day to start-up product teams piloting brand new services.

Our services include:

Analytics Business Intelligence and Data-Driven Product Development through flighting and A/B testing.

Privacy Minimizing risk through process and design.

End-to-end Data Pipeline Instrumentation, Transmission, Extraction, Transformation, Loading and Aggregation (Map Reduce)

Ops and Program Management Documentation. Hiring and managing engineering resources. Datastore hosting.

Technologies and Concepts we work with daily include: SQL/RDBMS (MySQL, PostgreSQL, SQL Server), NoSQL (Mongo, SimpleDB), Map Reduce (Hadoop, MS Cosmos, Mongo), OLAP (SSAS, Mondrian), Cloud Hosting (EC2, RDS, SQS, S3)

The Data Dictionary

Data documentation and issue tracking in a structured database.

Centralize Documentation
    Document what you're collecting
    Document what it means
    Annotate with analysis
    Track issues

Broaden Data Use In Your Organization
    Easier access to data
    Faster ramp up for new colleagues

Share and Collaborate
    Update documentation together
    Track issues together
    Share analysis

In 2007, realizing that there were surmountable technical and policy roadblocks to sharing sensitive personal information, we created The Common Data Project to develop an "open data" sharing service to safely release sensitive data to the public.

The Common Data Project is a non-profit exploring new ways to model the data and privacy relationship between service providers and user. Our mission is to safely broaden public access to sensitive personal information for re-use in research and policy-making.

Areas of research include: Policy arguments for the Datatrust. Fair-Trade Data and the Privacy "Food Label." Accounting practices for measurable and verifiable privacy guarantees. Self-Service Map Anonymizer.

SGM is donating time to CDP to conduct technical explorations of differential privacy technology.


Read our White Paper.

Datatrust Platform

The Datatrust Platform will be a "open data" sharing platform for releasing sensitive data records to the public with a measurable privacy guarantee.

The platform will enable more timely data releases by doing away with the need for labor-intensive and inexact anonymization methods like scrubbing, swapping or synthesizing data.

Instead, privacy will be guaranteed in a quantifiable way using an adapted version of differential privacy. Anonymization will happen on-the-fly, on a query-by-query basis.

To date, there are no methods in use that can measure the effectiveness of their anonymization techniques or allow for arbitrary queries of data. Instead, most anonymization is subjective and results in either woefully inadequate privacy protection or pre-digested aggregate reports that limit the accuracy and usefulness of the data.

Read more about our work on defining a measurable privacy guarantee and see our demo of differential privacy in action.