16. Original data
PCA()
function
PCA model
Transformed data
APPLY_PCA(…
num_components=k)
APPLY_INVERSE_PCA()
function
Approximated data
APPLY_PCA(…
num_components=k)
New data
Training function
(e.g. LOGISTIC_REG,
RF_CLASSIFIER)
Predict function
(e.g. PREDICT_LOGISTIC_REG, PREDICT_RF_CLASSIFIER)
Predicted values
(e.g. will user click the ads)
Model
Transformed new data
n rows, p cols
n rows, k cols
PCA/SVDの活用シーン
24. 機密データを保護された列に変換するために、
SecureData UdxをFPE(フォーマット保持暗号化)に使用
COPY users(name, ssn_raw FILLER VARCHAR(11), cc_num, ssn AS VoltageSecureProtect(ssn_raw USING
PARAMETERS format='ssn', config_dfs_path='/voltagesecure/conf')) FROM 'users.csv' DELIMITER
','; SELECT * FROM users;
name | ssn | cc_num
-------+-------------+---------------------
Alice | 613-52-2222 | 1111-2222-3333-4444
Bob | 284-46-3333 | 2222-3333-4444-5555
Carol | 019-79-4444 | 3333-4444-5555-6666
(3 rows)
CREATE ACCESS POLICY ON users FOR COLUMN cc_num CASE WHEN NOT has_role('trusted_app') THEN
VoltageSecureProtect(cc_num USING PARAMETERS format='cc',
config_dfs_path='/voltagesecure/conf') ELSE cc_num END ENABLE;
SELECT * FROM users;
name | ssn | cc_num
-------+-------------+---------------------
Alice | 613-52-2222 | 1111-10PD-S98R-LK5J
Bob | 284-46-3333 | 2222-JT3K-UU13-VM1V
Carol | 019-79-4444 | 3333-4DBC-QEV1-H79B
(3 rows)
25. SecureData UDxを使用して保護された列を復号化
SELECT VoltageSecureProtectAllKeys('1111-2222-3333-4444' USING
PARAMETERS format='CC') OVER ();
data | protected
---------------------+---------------------
1111-2222-3333-4444 | 1111-AOJV-H6OM-6RJD
1111-2222-3333-4444 | 1111-FPMR-ZW7J-S6EO
1111-2222-3333-4444 | 1111-10PD-S98R-LK5J
(3 rows)
SELECT name, ssn FROM users_secure u JOIN (SELECT
VoltageSecureProtectAllKeys('1111-2222-3333-4444' USING PARAMETERS
format='cc', config_dfs_path='/voltagesecure/conf') OVER ()) pak
ON u.cc_num = pak.protected;
name | ssn
-------+-------------
Alice | 613-52-2222
(1 row)
27. 外部テーブル(ORC/Parquet)のライセンス監視追加
SELECT audit_license_size();
audit_license_size
----------------------------------------------
Raw Data Size: 0.00GB +/- 0.00GB
License Size : 500.00GB
Utilization : 0%
Audit Time : 2018-03-26 17:39:07.236713-04
Compliance Status : The database is in compliance with respect to raw data size.
...
(1 row)
SELECT database_size_bytes, audited_data FROM license_audits ORDER BY audit_end_timestamp DESC LIMIT
4;
database_size_bytes | audited_data
---------------------+--------------
150 | Total
150 | External -- NEW audited data type
0 | Flex
0 | Regular
(4 rows)
Different ec2 node types
Is this the new data warehouse?
Designate specific nodes as a sub-cluster to isolate workloads and support multi-tenancy. Workload isolation allows designation of compute resources to different teams, functions, and tasks to insulate specific workload performance from other sub-cluster activity.
Here’s the Eon Architecture.
The optimizer and execution engine are almost entirely untouched by the Eon project – you get all the Vertica goodness you’ve come to expect – full SQL, cost based optimizer, high performance analytic functions, local joins, compressed storage, and more.
We’ve renamed the cache to the “depot” for reasons that will become obvious later.
There’s a storage API which supports flexible deployments.
No current plans to support the WOS.
No current plans to support the WOS.
Good for OEMs and good for those who use SQL – DW
1600 features max
Biotech and genomics has many more features.
Need to buy the Voltage Server sep
Not necessarily for whole db