Lead Data Engineer Resume

SUMMARY

Strong decision - maker with more than 19 years of experience in software engineering. Considered an effective coach and mentor and committed to leading development and administrative teams. Looking to work closely with both teams and customers to find the most efficient and beneficial solutions for process improvement.
Self-directed and motivated Senior Software Engineer works effectively in dynamic environments.
Versatile Senior Manager specializing in Hadoop development and administration and skilled at planning, implementing and overseeing key improvements to drive business growth and efficiency. History of cultivating an open culture with free exchange of information. Pursuing new professional challenges with a growth-oriented company.
Avid technologist interested in emerging technology and seeking new-innovative ways to solve the impossible

TECHNICAL SKILLS

Computer Skills: ETL, SQL, PL/SQL, MS SQL, MySQL, HTML, Perl, Python, Shell scripting, Hadoop Administration, AWS, Google Cloud, Spark / Scala, PySpark, Pig, Hive, HBase, Cassandra, MongoDB, Snowflake, Storm, Kafka, Sqoop, Oozie, Flume, Knox, Ranger, HUE, Ambari Views

Operating Systems: UNIX (Sun and HP), Linux, (RedHat), CentOS, Ubuntu, Windows OS, Apple OS

Development Software: Informatica, Cognos, Business Objects, Xcelsius, Object Team, Adobe, TOAD (for Hadoop), SAP Business Objects Web Intelligent tools, RStudio, AtScale, Druid, Superset, Tableau, Microsoft Visio, Microsoft Office software and SPSS.

PROFESSIONAL EXPERIENCE

Confidential

Lead Data Engineer

Responsibilities:

UI strategy and web strategic development
Create SQL, PL/SQL procedures and functions, stored procedures and packages within the mappings.
Tune Informatica mappings and sessions for optimum performance.
Scala Hive ETL development
Solve complex scenarios and coordinated with source systems owners with day-to-day ETL progress monitoring.
Implemented Spring boot microservices to process the messages into the Kafka cluster setup.
Gather business requirements and guided the offshore team on timely fashion.
Kafka cluster setup on the QA and Production environments.
Identify the Kafka message failure scenarios.
Implemented to reprocess the failure messages in Kafka using offset id.
Implemented Kafka producer and consumer applications on Kafka cluster setup with help of Zookeeper.
Used Spring Kafka API calls to process the messages smoothly on Kafka Cluster setup.
Create test development team implementation
Develop backend database model
Perform unit testing and debugging
Develop management strategies
Develop code repository strategy
Implement production support strategy
Support schedule jobs in production environment
Perform DevOps deployment of application code using Jenkins and IntelliJ
GCP software test integration data analytics using Big Query
AWS IAM and EMR deployment
AWS Snowflake data warehouse
AWS S3 setup and configuration (buckets)

Confidential

Big Data Engineer

Responsibilities:

Develop SQL queries to extract data for analysis and model construction
Create technical design document for data extraction
Create Run Book for data extraction support
Perform unit testing and debugging
Scope required code completion time
Develop required Scala, Python, PySpark ETL framework for data extraction / bulk extraction
Interact with various teams to refine and clarify data extraction requirements
Support schedule jobs in production environment
Perform architecture design, data modeling, and implementation
Assisted various business groups with document organization and dissemination during software deployment.
Worked with MongoDB document store

Confidential

Lead Hadoop Architect

Responsibilities:

Responsible for implementation, support, and management of the enterprise Hadoop analytic data lake environments. This involves designing, capacity arrangement, cluster set up, performance fine-tuning, monitoring, structure planning, scaling and administration. Also, the implementation of peripheral software for analytics and development.
Support Development, Test & Production environments.
Perform architecture design, data modeling, and implementation of Data Lake as a Big Data platform to drive Personalization, marketing and analytics applications
Design, architect and build highly scalable and extensible Customer Data Lake to enable collection, storage, modeling, and analysis of massive Customer data comprising of tags, webserver logs, social media and related data elements in its raw format and make it accessible for data mining and analytics
Demonstrated ability to communicate highly technical concepts in business terms and articulate value of adopting Big Data technologies.
Communicate with business stakeholders and explain technical concepts to a non-technical audience as needed
Hands-On Experience in architecture and implementation of large and highly complex projects
End to end experience in large scaled data lake/date warehouse projects
Hands-on experience in Apache Big Data Components/Frameworks
Create run book schedules for nightly jobs to meet business data load requirements.
Produce daily cluster reports and documents for senior team members.
Worked directly with departments, clients, management, and vendors to achieve project goals.
Assisted various business groups with document organization and dissemination during software deployment.
Supported Chief Operating Officer with daily operational functions.
Support Senior Data Architect with data consolidation efforts.
Support Senior IT Vice President with budget planning for Hadoop clusters
Attend weekly change review meetings to discuss cluster planned configuration changes to improve performance or correct defects.
Create and implement scripts to manage cluster logs.
Present cluster engineering plans to senior management
Involves designing, capacity arrangement, cluster set up, performance fine-tuning, monitoring, structure planning, scaling and administration.
Troubleshooting memory issues and utilization aspects of the clusters.
Administer and maintain platforms - such as Syncsort and AtScale.
HBase database administration
Perform ongoing implementations, administration, maintenance of the platforms.
Document and formalize processes for the environments.
Document processes for the administration of Hadoop platforms.
Complete understanding of all application in Hadoop ecosystem.
Install and maintain Apache NiFi clusters
Implement shell scripts to maintain logs rotation on several application levels and for different platform configurations.
Implement shell scripts to provide daily health reports and service monitoring on the cluster.
Install AtScale development software
Implement APIs to schedule AtScale relational cube builds
Engineer and develop run books for scheduled AtScale jobs execution.
Create shell script to access APIs to implement scheduled run of AtScale jobs.
Integrate with development and managements teams to meet project goals.
Collaborate with senior management to strategize cloud to on premise integration of Hadoop and Apache NiFi clusters.

Confidential

Sr Data Engineer, Senior Staff

Responsibilities:

Manage pre-prod and production clusters using Cloudera Manager on AWS to quickly identify failures and resource issues.
Contribute to the vision, deployment and administration of Hadoop enabled infrastructure as a core service to all business functions within the organization using AWS infrastructure.
Enable use access and manage corporate policies using AWS IAM.
Build data expertise and own data quality for ingestion pipelines.
Interface with engineers, product managers and product analysts to understand data needs.
Architect, build and launch new data models that provide intuitive analytics to end users.
Design, build and launch extremely efficient & reliable data pipelines to move data (both large and small amounts) to the Data Warehouses.
Design, build and launch new data extraction, transformation and loading processes in production.
Create new systems and tools to enable the customer to consume and understand data faster.
Apply coding skills across a number of languages from SQL, Scala, Python, and PySpark
Work across multiple teams in high visibility roles and own the solution end-to-end providing advice on the capabilities within AWS.
Support the administration of on premise and AWS cloud-based Hadoop clusters.
Make information available to large scale, next generation, predictive analytics applications.
Support and integrate big data tools/frameworks.
Work with, to build, implement and support the data infrastructure; ingest and transform data (ETL/ELT process).
Integrated Apache Kafka for data ingestion
Designed and implemented by configuring Topics in new Kafka cluster in all environments
Installed Kafka manager for consumer lags and for monitoring Kafka Metrics
Be available on call (on rotation) in a support role.
Manages and participates in the day to day operational work across the Hadoop clusters.
Works closely with the hosted operations colleagues to define operational best practice for the UAT and Prod Hadoop clusters.
Participates in project planning and review as they pertain to Hadoop and Hadoop clusters.
Main point of contact for vendor escalation on open issues related to the cluster maintenance, performance and upgrade.
Participate in interviewing candidates to fill open Hadoop development and administration.
Verified data integrity and accuracy.
Validated schematic designs working alongside Hadoop data engineers.
Work closely with off shore teams to meet project objectives and cluster maintenance support.

Confidential

Sr Hadoop Administrator

Responsibilities:

Collaborate with Think Big's Engineering, Data Science and, Customer teams to scope, size, install and configure components of the open source Hadoop Big Data platform on AWS, Teradata or RHEL environments.
Install peripheral tools such as HUE, RStudio and OpenLDAP.
Manage users on the Hadoop platform with enterprise Active Directory.
Encrypt data at rest and in motion.
Serve as an escalation point for the Think Big Managed Services teams that provide on-going operational support for Think Big's customers.
Collaborate with Think Big's project leads to develop project level scoping plan.
Research tools to accommodate customer requirements.
Develop test plans for initial Hadoop services component testing.
Develop POC application projects using Hive, HBase, Pig, PySpark, and Oozie.
Perform system administration tasks to include using Ambari to install and provision Hadoop clusters, onboard users to the Hadoop cluster, setup High Availability (HA) for key components in Hadoop such as the Namenode, Resource Manager, and any other component identified as being critical to the customer or use cases.
Work with the Think Big's operations practice leads to develop and ensure consistent deployment of best practices across all of Think Big's projects.
Design and develop tools to support proactive administration and monitoring of the open source Hadoop Big Data platform.
Support sales efforts scoping engagements and developing statements of work.
Apply tuning to components such as Hive, HBase and Spark to enhance performance.
Familiar with the use of Apache Ranger for user authorization, access, and for auditing on Hortonworks data platform and the use of Apache Knox for Hadoop perimeter gateway access.
Use Linux shell scripting where necessary to automate tasks or to fill gaps where tools cannot perform the tasks.
Familiar with Hadoop available streaming components such as Kafka, Spark, Storm and Flunk.
Integrate with client development team to develop applications to develop data pipelines based on specific use case scenarios.
Familiar with cloud computing environments such as AWS and Google cloud environments.
Collaborate with systems administrators and architects when necessary to perform system designs.
Familiar with Virtual Machine tools such as VMWare and VirtualBox.
Worked with clients on both Hortonworks and Cloudera Hadoop distributions.
Using both Ambari and Cloudera Manager to install and manage clusters.
Designed and implemented by configuring Topics in new Kafka cluster in all environment.
Installed Kafka manager for consumer lags and for monitoring Kafka Metrics also this has been used for adding topics, Partitions etc.
HBase database administration
GCP platform implementation for data analytics
AWS IAM and EMR deployment for Hadoop services and access
AWS Snowflake data warehouse migration from onsite to cloud environment
AWS S3 setup and configuration (buckets)

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship