Data Manipulation at Scale: Systems and Algorithms
I’ve finished a Data Science course on Coursera recently, Data Manipulation at Scale: Systems and Algorithms, which is taught by Bill Howe.
Some parts of the course aren’t new for me, but this is a good course on the whole. Many thanks to Mr Howe for passing his comprehensive knowledge on the large scale data manipulation and analysis.
I’ve really learned something amazing from it.
- Do matrix multiplication in relational database. I’ve never thought of things like that, but it turns out to be doable.
- Matrix multiply in MapReduce. Another matrix multiplication, but in a very tricky MapReduce way, very interesting.
- Consistency Hashing. Every developer should know it if you don’t want to be fired.
- Pattern Matching. I was impressed by the PRISM program example where NSA can use Datalog-like query to perform pattern search for potential bad person.
- Many many industrial products and examples. I bet Bill has an informative chronicle of database family in mind, he spent plenty of time introducing databases one by one, some of which I don’t even know exist, really eyeopener.
- Interesting assignments. For example, one assignment is to analyze sentiment of Twitter stream, and find out which state of US is happiest, another is to write MapReduce programs to “find asymmetric friendships” and “do join like relational database”. But I wish there was some more challenging assignments.