- Strong track record of success creating big data solutions for key business initiatives in alignment with analytics architecture and future state vision.
- Seasoned information technology professional skilled in business analysis, business intelligence, data modeling, data architecture, and data warehousing.
- Proven ability to deliver on organization mission, vision, and priorities.
- Extensive hands - on experience leading multiple data architecture projects, gathering business requirements, analyzing source systems, and designing data strategies for dimensional, analytical, transactional, and operational data systems.
- Systems/Software Engineering
- Requirements Analysis
- Database Development
- Blockchain Programming
- Shell Scripting
- Agile Methodologies
- Machine Learning/Deep Learning
- Business Development
- Leadership/Team Training and Support
- Project/Vendor Management
Big Data Technologies: HDFS, MapReduce, YARN, Hive, Pig, Sqoop, Impala, Oozie, Flume, Zookeeper, Kafka, Nifi, HBase, MongoDB, Stream sets, Talend, Splunk, Kibana, Logstach, Elastic Search, Kudu
Spark Components: Spark, Spark SQL, Spark Streaming, and Spark Mlib
Cloud Services: AWS, S3, EBS, EC2, VPC, Redshift, EMR, Azure, Cloud Front, Glue, Athena
Artificial Intelligence: Machine Learning, Deep Learning, TensorFlow, Scikit, Learn, Sage Maker, Keras, PyTorch
Blockchain: Ethereum, Cardano, R3, Hyper Ledger, Smart Contract
Programming Languages: Java, Python, Scala, R, Solidity
Scripting/Query Languages: Unix Shell scripting, SQL and PL/SQL
Databases: Oracle, MySQL, SQL Server, Netezza, Teradata
Other: Maven, Eclipse, Pycharm, RStudio, Juypter, Zeppelin, Tableau, GitHub, Jenkins, Bitbucket, Bamboo, Jira, TFS, VSTS, Docker, Autosys, Control M
Confidential, Neptune, NJ
- Performed complex UPSERTS with Kudu, MongoDB for large data volumes derived from various sources; process large-scale electronic medical and financial records, data sets for daily and monthly stored data in Amazon S3, Redshift, HDFS, and Blob Storage.
- Develop predictive modeling using machine learning algorithms such as linear regression, logistic regression, and decision trees.
- Develop and design extract, transform, load (ETL) applications using big data technology and automate using Oozie, Control M, Autosys, and shell scripts.
- Utilize Jenkins, Bamboo to continue integration for project and code to build before deployment.
- Set up and run data ingestion using Streamsets and Nifi for various data formats and sources.
- Mentor and supervise on-site employees and outsourced/off-site personnel; write and update guidelines and protocols for teams to complete objectives.
- Solely built Lambda and Kappa Architecture and solutions for on-premise, hybrid, on-cloud; also designed API with Docker which connects to MongoDB as source for both on-premise and on-cloud.
- Transformed unstructured data into structured data with Apache Spark, utilizing data frame and querying from other data sources to S3, Redshift, Hive, Impala, Kudu, and MongoDB.
- Built Ethereum blockchain and deployed smart contracts in private network; also built application with hyper ledger fabric using hyper ledger composer in Bluemix.
- Conceptualized and created models using machine-learning regression techniques.
Confidential, Charlotte, NC
- Led capacity planning of Hadoop clusters based on application requirement.
- Guided several Hadoop clusters and other services of Hadoop Ecosystem in development and production environments.
- Contributed to evolving architecture of company services to meet changing requirements for scaling, reliability, performance, manageability, and pricing.
- Developed, designed, and automated ETL applications utilizing Oozie workflows and shell scripts.
- Created sentry policy files for business users in development, user acceptance testing, and production environments to provide access to required databases and tables in Impala; also designed and incorporated security processes, policies, guidelines for accessing cluster.
- Converted copybook files from EBCDIC ASCHII, binary formats; stored files in HDFS; created Hive tables to decommission mainframes to make Hadoop primary source for export to mainframes.
Confidential, Springfield, IL
- Pulled data from Relational Database Management System (RDBMS) such as Teradata, Netezza, Oracle, and MySQL utilizing Sqoop; stored data in Hadoop Distributed File System (HDFS).
- Utilized shell script to developed and deployed internal tool for comparing RDBMS and Hadoop such that all data located in source and target matched.
- Created external Hive tables to store and run queries on loaded data.
- Architected, implemented, and tested data analytics pipelines with Hortonworks/Cloudera.
- Implemented partitioning and bucketing techniques for external tables in Hive, improving space and performance efficiency.