Operating system: Unix/Linux, IBM mainframe, Microsoft .NET
RDBMS: Oracle, MySQL, Derby, Maria, SQLServer, DB2, PostgreSQL
MPP Analytic: D Netezza and nzlua extension, Teradata, Cassandra
NoSQL: HBase, AWS DynamoDB, Cassandra, MongoDB, BigTable
Embedded: prog SQL ProC, cython, C, C++, C#
Python: libraries iPython, Jupypter, Notebook, PySpark, SciKit, Window.NET Framework 4.x, C#, SSIS, Powershell - AWS, Azure
ETL: Informatica PowerCenter 8-10.x IDQ, BDE, Data Lake, MDM Talent, Pentaho, SSIS, Datastage
Search: ElasticSearch, Kibana, Lucene, Solr
Microsoft: Azure Data Factory, Data Bricks, SSIS, Data Lake
Big Data Solution Architect
- Design, build, and manage analytics infrastructure that can be utilized by data analysts, data scientists, and non-technical data consumers, which enables functions of the big data platform for Analytics.
- Develop, construct, test, and maintain architectures, such as databases and large-scale processing systems that help analyze and process data in accordance to Digital Analytics Insight requirements and needs.
- Corroborate tightly with Data Scientists to take existing Legacy RDBMS or new models in converting them into Hadoop scalable analytical solutions.
- Design, document, build, test and deploy data pipelines that source from variety of formats - structured, unstructured and semi-structured into Data Lake, Data Warehouse and Enterprise Data Hub in enabling a unified view.
- Create data models that will allow analytics and business teams to derive insights about customer behaviors.
- Discovers and catalogs metadata about your data stores into a central catalog. You can process semi structured data, such as clickstream or process logs.
- Populates the AWS Glue Data Catalog with table definitions from scheduled crawler programs.
- Crawlers call classifier logic to infer the schema, format, and data types of your data. This metadata is stored as tables in the AWS Glue Data Catalog and used in the authoring process of your ETL jobs.
- With BIAN framework, define swagger LOB-dedicated APIs to address Data Governance and Stewardship in alignment with TD Enterprise Architect within Hadoop Data Foundation to business layers:
Big Data Lead
- Implemented and evolved existing model to address new business needs, data dictionary on enterprise critical mission - FATCA, AML, CM4, on-line credit transaction and the likes.
- Inventoried files system for archiving metadata on several layers of sources in accordance to Federal Reserve Bank financial regulatory, and periodic reporting, SOX, AML and the likes.
- Assumed full development cycle of Big Data solutions including architecting data acquisition (into Data Lake or Data Platform), standardization, validation, then visualization using Hadoop ecosystem like Hive QL, Spark via scala and python, Flume, and the likes.
- Created ETL data pipeline of transactional and non-transactional throughout five stages as part of analyzing and gaining insight: acquire, prepare, analyze, validate, and operationalize.