We build Big Data systems for organizations with a compelling need for high quality data.
SGM helps companies become data-driven organizations that innovate through data.
We practice the soft science of asking questions of data in a way that yields a nuanced picture of how things really stand, not an oversimplication that obscures the challenges which lie ahead.
Our clients include mature data consumers trying to keep up with terabytes of data coming in from millions of users every day to start-up product teams piloting brand new services.
Our services include:
Analytics Business Intelligence and Data-Driven Product Development through flighting and A/B testing.
Privacy Minimizing risk through process and design.
End-to-end Data Pipeline Instrumentation, Transmission, Extraction, Transformation, Loading and Aggregation (Map Reduce)
Ops and Program Management Documentation. Hiring and managing engineering resources. Datastore hosting.
Technologies and Concepts we work with daily include: SQL/RDBMS (MySQL, PostgreSQL, SQL Server), NoSQL (Mongo, SimpleDB), Map Reduce (Hadoop, MS Cosmos, Mongo), OLAP (SSAS, Mondrian), Cloud Hosting (EC2, RDS, SQS, S3)
The Data Dictionary
Document what you're collecting
Document what it means
Annotate with analysis
Broaden Data Use In Your Organization
Easier access to data
Faster ramp up for new colleagues
Share and Collaborate
Update documentation together
Track issues together
The Common Data Project is a non-profit exploring new ways to model the data and privacy relationship between service providers and user. Our mission is to safely broaden public access to sensitive personal information for re-use in research and policy-making.
Areas of research include: Policy arguments for the Datatrust. Fair-Trade Data and the Privacy "Food Label." Accounting practices for measurable and verifiable privacy guarantees. Self-Service Map Anonymizer.
SGM is donating time to CDP to conduct technical explorations of differential privacy technology.
The Datatrust Platform will be a "open data" sharing platform for releasing sensitive data records to the public with a measurable privacy guarantee.
The platform will enable more timely data releases by doing away with the need for labor-intensive and inexact anonymization methods like scrubbing, swapping or synthesizing data.
Instead, privacy will be guaranteed in a quantifiable way using an adapted version of differential privacy. Anonymization will happen on-the-fly, on a query-by-query basis.
To date, there are no methods in use that can measure the effectiveness of their anonymization techniques or allow for arbitrary queries of data. Instead, most anonymization is subjective and results in either woefully inadequate privacy protection or pre-digested aggregate reports that limit the accuracy and usefulness of the data.