4. Regular Expressions
(RegEx/grep)
❖ Match a string of text by defining a pattern
❖ Useful for cleaning up or identifying data
❖ “Find” Demo on http://regexpal.com
❖ “Find/Replace” Demo with
http://www.sugarscript.com/findandreplace/index.php
❖ Interested? Interactive tutorial on http://regexone.com
6. Database History
❖ List-based
❖ Follow link from one record to another (linked-list)
❖ File-system data stores
❖ Based on filenaming convention, limited by file i/o
speeds
❖ Generic data storage and management
❖ Relational modeling or entities and relationships
(ER)
7. Relational Modeling: In
English
❖ A Group has many People
❖ A Person belongs to one Group
❖ A Group has many Projects
❖ A Project belongs to one Group
❖ A Person has many Projects
❖ A Project has many People
9. Relational Modeling: Tables
Group:
id
name
url
Person:
id
name
password
group_id
many 1
Project:
id
name
url
1
many
many
many
Membership:
person_id
project_id
10. Relational Modeling: Keys
Group:
id
name
url
Person:
id
name
password
group_id
many 1
Project:
id
name
url
1
many
many
many
Membership:
person_id
project_id
key
Foreign keys
key
key
11. Structured Query Language
(SQL)
❖ Works in lots of database servers
❖ SQLite, MySQL, PostgreSQL, MS SQL Server
❖ Standard way to:
❖ Find subsets of data based on criteria
❖ Merge data in separate tables
❖ Compute aggregate info
❖ Assumptions
❖ Don’t duplicate data (“data normalization”)
❖ Various parts of your data relate to each other
❖ Your metadata/schema (tables/columns) doesn’t change often
❖ Many frameworks will generate SQL for you
❖ Ask about Database Abstraction Layers
12. NoSQL
❖ Sometimes your data isn’t relational and the metadata
changes often
❖ Queuing, document storage, logging, real-time, low-latency,
concurrency
❖ Read this write up for more:
❖ http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis
13. Tangent: JavaScript Object Notation
(JSON)
❖ A human-readable data exchange format
❖ CSV, XML, YAML are some others
❖ Example:
❖ http://media.mongodb.org/zips.json
❖ http://mongohub.todayclose.com (for Mac)
16. Indexes
❖ An index tracks keys
❖ Convention: have an “id” column with an index on it
❖ Why all these indexes?
❖ Multiple ways to get at rows quickly
❖ Creating indexes is tricky
❖ Many frameworks include query logging to help you find
slow queries that might need optimizing
❖ Query optimization is a bit of an art
❖ Use the “Explain” command
17. Map-Reduce Instead of SQL
❖ Used to query large datasets
❖ Example: Count words in a document
❖ Map: select the data you need to operate on
❖ “emit” one records for each word in a document,
keyed by the word
❖ Reduce: combine the mapped data
❖ Sum up the uses of each word, “emitting” one
record for each total
18. Picking Data Storage
Strategies
❖ If you just need to dump data and pull it out by some id, use a no-sql
solution (MongoDB is simple)
❖ flexible, easy to start with
❖ If you are modeling an app, a relational database is usually the
right answer (MySQL/PostgreSQL are standard)
❖ Database modeling is REALLY important to get right at the
start of your project, because it is a pain to change later
❖ Names matter – choose your table names carefully
❖ PS: we can try stuff out on Amazon’s cloud services for free