Big Data, Team Lead Resume
SUMMARY:
Over the past 12 years my primary focus has been on Hadoop Big Data, Apache Spark, Containerization and Data Science. I have been a member of both Cloudera and Microsoft Professional Services. As a member of Cloudera Professional Services, I have led and completed over 20 implementations, which included Cloudera (CDH/CDP) products. Hands - on experience with streaming technologies such as Kafka, Flume, and Spark. I have a strong understanding of Docker and kubernetes containers. Hands on experience with visualization tools such as Tableau, Arcadia, and Power BI. Worked as Big Data team lead on large accounts. Perform Big Data training and mentoring of junior resources. Hands-on experience with SQL, Spark, PySQL, and Python programming.
CORE COMPETENCIES:
Hadoop Distribution: Cloudera (CDH/CDP), Azure HDInsight, and Hortonworks
Big Data: HDFS, NFS, HBase, MapReduce, Cloudera Manager, Ambari, MapR Control System, Cloudera Navigator, Machine Learning, YARN, HUE, Hive, Impala, Pig, Sqoop, Flume, Kafka, ADF(Data Factory), Oozie, Zookeeper, Spark, PySpark, Ganglia, Nagios, Avro, AWS, DevOps Tools (Chef, Puppet), Kerberos, Knox, Ranger, NiFi, Tez, Sentry, LDAP, and AD.
Databases: HBase, PostgreSQL, Oracle, DB2, SQL Server
Data Analysis: Python certification
Libraries: NumPy, Pandas, SciPy, Scikit-learn
Jupyter: Notebooks, RStudio IDE, Apache Zeppelin
Programming: Python certification
UNIX: shell scripting, Azure CLI, Python, and PowerShell.
WORK HISTORY:
Confidential
Big Data, Team Lead
Responsibilities:
- Led and managed Big Data Team, which is responsible for Builds, expansions, upgrades, security, and integration of all analytical tools such as Acadia.
- Serve as a subject matter expert in the area of Big Data and its ecosystem components in a multi-tenant cluster for Kafka, HDFS, Fume, Spark, Hive, Hbase, and YARN.
- Ran automated jobs via UNIX cron and Control M scheduling.
- Provide thought leadership and technical oversight to ensure BigData clusters meet business requirements and Standards for internal and external customers.
- Formulated and implemented the data strategy in line with global teams.
- Established strong relationships with the global application teams and stakeholders and ensure transparency of deliverables are met.
- Ensured Production stability, Data backup, and restorations along with high availability of services that are the key focus areas in all the deliveries.
- Tuned Kafka Prod cluster by adjusting the configuration parameters like num.partitions etc.
- Configured HBase to use HDFS High availability, cleanup split logs and added kafka topic to entity ‘mk consumer config’ in Hbase
- Secured Sqoop2 Server by enabling Kerberos Authentication and SSL Encryption and resolved issues with Firewall while connecting to RDBMS to extract data via Sqoop2.
- Upgraded Cloudera Manager and Cluster to 6.3 x documented the whole process.
- Enabled SPARK Encryption using Cloudera Manager for encrypting Spark data at rest, and data in transit.
- Worked with internal & external stakeholders in IT, vendors & businesses to collaboratively develop and drive the strategy & roadmap for the services in scope.
- Worked as a Subject Matter Expertise (SME) in BigData technologies, Data Security, Encryption( Rest and In transit), TLS, and task estimations.
- Created and implemented Disaster recovery and Data Backup plans.
- Worked with various distributed file formats such as AVRO, Parquet and common methods in data transformation.
- Utilized Automation tools such as Ansible, Chef, and shell scripting for configuring security for both Sentry and Ranger.
- Worked with various persistence storages such as MySQL, HDFS, and postagress databases.
- Utilized with streaming technologies such as Kafka, Flume, and Spark streaming following Lambda Architecture, troubleshooting, and debugging.
- Worked with Analytical tools such as Arcadia and PowerBI for integration with Hadoop and setup Queues and Quotas as per business requirements in a multi-tenant cluster.
- Utilized pyspark, Spark SQL, and Impala and troubleshoot and recommend best practices in these technologies based on use-case scenarios.
- Performed General operational expertise of systems capacity, bottlenecks, basics of memory, CPU, OS, storage, and networks troubleshooting.
- Drove root cause analysis of system vulnerabilities to identify mitigation strategies.
Confidential
Technical Lead - Azure Cloud (IaaS)
Responsibilities:
- Participated in administrative activities includes HDInsight Spark/HBase/Kafka/Interactive Query (LLAP) Cluster deployment using ARM Templates and using Azure Portal, creating Resource Groups, NSG Rules, Scaling the Clusters, creating blob and ADLS Storage accounts, day to day operational activities like monitoring the jobs, giving recommendation to the skewed jobs, different service related issues, tuned multiple services like Yarn, Kafka, Impala, Spark, Hive, Performance Tuning and configuration checks, setup process for User Onboarding, and Backup metadata.
- Played Architect role in setting up the process includes User Onboarding, Application Onboarding, setting up optimal SKU for Spark/HBase/Kafka Clusters from lower to higher environments, recommending different configuration setting changes from default to non-default values for multiple components like Spark2, MapReduce2, Hive, LLAP, and Queue Manager to enhance the cluster performance.
- Handled and coordination with Microsoft/Hortonworks Support team and different production support team
- Setup the Nagios alert system to collect all possible/reasonable metrics to alert us only on those that require an action
- Setup Kafka Performance Flags and alerts to address the non-sync between producer and consumer.
- Installed and configured Cloudbreak in a VM using Azure cloud resources.
- Setup multiple Cloudbreak blueprints, recipes, Management packs and configured external databases for Ranger.
- Involved in setting up Azure subscription/ interactive Credential, and Vnet/Subnet.
- Setup ADLS Gen2 storage account with two file systems storage-fs and logs-fs.
- Created Managed Identities for Data Lake Admin, Assumer, Ranger Audit Logger and Logger.
- Created cluster template (blueprint) for our application using existing cluster template.
- Used Management Console to register the clusters and build new clusters.
- Used Data Hub to configure cluster topology (master, worker, and compute) and cloud storage for HDFS, Yarn and Zeppelin.
- Used Replication Manager to register the existing clusters and copied the HDFS data
Confidential
Sr. Cloudera Engineer
Responsibilities:
- Assisted with monitoring and troubleshooting of Hive, HBase, Kafka, and Hadoop HDFS.
- Utilized Oracle databases for storing metadata for Ranger, Oozie, and Hive storage.
- Performed data migrations from on-prem to Azure Data Factory and Azure Data Lake. handled version control systems with Git, and Bitbucket for software and documentation.
- Created Python programs to automate manual processes.
- Performed configuration management utilizing Chef and Ansible. handled version control systems with Git, and Bitbucket for software and documentation.
- Performed day to day cluster management and security utilizing TLS, Kerberos, Cloudera
- Configured and optimized HDFS, YARN, Sentry, Hue, Navigator, Impala, and Spark
- Utilized Sqoop and Flume to export data into HDFS from relational databases
- Perform troubleshooting on SQL Server Integration Services (SSIS) and ETL packages.
- Scheduled Oozie workflows to generate the monthly reports files automatically.
- Created a Backup and recovery solutions using Cloudera Enterprise Backup and Disaster Recovery (BDR), and Snapshots.
Sr. Cloud Engineer
Confidential
Responsibilities:
- Performed performance turning on Apache Pig, Hive, and HBase to increase MapReduce jobs.
- Designed and Developed Real Time Stream Processing Application using Kafka, and Hive to perform Streaming ETL and apply Machine Learning.
- Performed security configurations with Ranger, Kerberos, and HDFS commands.
- Perform troubleshooting on Azure storage and cluster builds.
- Create Azure Data Lakes and Data Factories for HDInsight cluster builds.
- Worked with HiveQL and Spark to transform data from HDFS and Kafka.
- Utilized NoSQL database HBase tables to store Internet of Things (IoT) device information.
- Extract datasets from Excel and RDBMS databases and perform data cleaning, Data frame. manipulation, and summarization utilizing Python programming.
- Build machine learning Regression models and data pipelines using Python libraries.
- Machine Learning
- Researched, designed and prototyped robust and scalable models based on machine learning data mining, and statistical modeling to answer key business problems
- Worked with development teams & business groups to ensure models can be implemented as part of a delivered solution replicable across departments.
- Converted datasets into actionable (modeling) to Predict and/or Analyzed habits, budget population segmentation, and population classification.
- Utilized Jupiter Notebooks, RStudio IDE, Apache Zeppelin for developing in Python and creating predictive models and visualization.
Big Data Engineer
Confidential, Atlanta, GA
Responsibilities:
- Primary participant in certifying Big Data products for use within the tenancy group, new cluster evaluation recommendations, RAM estimates and Hadoop cluster upgrade Cloudera CDH.
- Active participant in performing COB (continuity of Business) Switchover includes COB Cluster Checkout, COB Cluster testing for all the components like Hadoop Cluster, MySQL, Flume, Datameer, Platfora, Data Ingestion, Talend, and Data Center Failover.
- Setup/Configured/Documented Hive Metastore high availability (HA)
- Primary participant in Sqoop setup and securing Sqoop2 Server and used Sqoop on Sentry enabled cluster.
- Build the BDR Requirement template and COB Business recovery plan template.
- Installed Kafka, Enabling SSL and High Availability on Kafka Brokers and ingested multiple cluster kafka topics into one cluster using Mirror Maker Service and handled multiple Kafka failure issues.
- Configured HBase to use HDFS High availability, cleanup split logs and added kafka topic to entity ‘mk consumer config’ in Hbase.
- Secured Sqoop2 Server by enabling Kerberos Authentication and SSL Encryption and resolved issues with Firewall while connecting to RDBMS to extract data via Sqoop2.
- Upgrade Cloudera Manager and Cluster, documented the whole process.
- Enabled SPARK Encryption using Cloudera Manager for encrypting Spark data at rest, and data in transit.
