We provide IT Staff Augmentation Services!

Lead Data Engineer Resume

5.00/5 (Submit Your Rating)

SUMMARY

  • Strong decision - maker with more than 19 years of experience in software engineering. Considered an effective coach and mentor and committed to leading development and administrative teams. Looking to work closely with both teams and customers to find the most efficient and beneficial solutions for process improvement.
  • Self-directed and motivated Senior Software Engineer works effectively in dynamic environments.
  • Versatile Senior Manager specializing in Hadoop development and administration and skilled at planning, implementing and overseeing key improvements to drive business growth and efficiency. History of cultivating an open culture with free exchange of information. Pursuing new professional challenges with a growth-oriented company.
  • Avid technologist interested in emerging technology and seeking new-innovative ways to solve the impossible

TECHNICAL SKILLS

Computer Skills: ETL, SQL, PL/SQL, MS SQL, MySQL, HTML, Perl, Python, Shell scripting, Hadoop Administration, AWS, Google Cloud, Spark / Scala, PySpark, Pig, Hive, HBase, Cassandra, MongoDB, Snowflake, Storm, Kafka, Sqoop, Oozie, Flume, Knox, Ranger, HUE, Ambari Views

Operating Systems: UNIX (Sun and HP), Linux, (RedHat), CentOS, Ubuntu, Windows OS, Apple OS

Development Software: Informatica, Cognos, Business Objects, Xcelsius, Object Team, Adobe, TOAD (for Hadoop), SAP Business Objects Web Intelligent tools, RStudio, AtScale, Druid, Superset, Tableau, Microsoft Visio, Microsoft Office software and SPSS.

PROFESSIONAL EXPERIENCE

Confidential

Lead Data Engineer

Responsibilities:

  • UI strategy and web strategic development
  • Create SQL, PL/SQL procedures and functions, stored procedures and packages within the mappings.
  • Tune Informatica mappings and sessions for optimum performance.
  • Scala Hive ETL development
  • Solve complex scenarios and coordinated with source systems owners with day-to-day ETL progress monitoring.
  • Implemented Spring boot microservices to process the messages into the Kafka cluster setup.
  • Gather business requirements and guided the offshore team on timely fashion.
  • Kafka cluster setup on the QA and Production environments.
  • Identify the Kafka message failure scenarios.
  • Implemented to reprocess the failure messages in Kafka using offset id.
  • Implemented Kafka producer and consumer applications on Kafka cluster setup with help of Zookeeper.
  • Used Spring Kafka API calls to process the messages smoothly on Kafka Cluster setup.
  • Create test development team implementation
  • Develop backend database model
  • Perform unit testing and debugging
  • Develop management strategies
  • Develop code repository strategy
  • Implement production support strategy
  • Support schedule jobs in production environment
  • Perform DevOps deployment of application code using Jenkins and IntelliJ
  • GCP software test integration data analytics using Big Query
  • AWS IAM and EMR deployment
  • AWS Snowflake data warehouse
  • AWS S3 setup and configuration (buckets)

Confidential

Big Data Engineer

Responsibilities:

  • Develop SQL queries to extract data for analysis and model construction
  • Create technical design document for data extraction
  • Create Run Book for data extraction support
  • Perform unit testing and debugging
  • Scope required code completion time
  • Develop required Scala, Python, PySpark ETL framework for data extraction / bulk extraction
  • Interact with various teams to refine and clarify data extraction requirements
  • Support schedule jobs in production environment
  • Perform architecture design, data modeling, and implementation
  • Assisted various business groups with document organization and dissemination during software deployment.
  • Worked with MongoDB document store

Confidential

Lead Hadoop Architect

Responsibilities:

  • Responsible for implementation, support, and management of the enterprise Hadoop analytic data lake environments. This involves designing, capacity arrangement, cluster set up, performance fine-tuning, monitoring, structure planning, scaling and administration. Also, the implementation of peripheral software for analytics and development.
  • Support Development, Test & Production environments.
  • Perform architecture design, data modeling, and implementation of Data Lake as a Big Data platform to drive Personalization, marketing and analytics applications
  • Design, architect and build highly scalable and extensible Customer Data Lake to enable collection, storage, modeling, and analysis of massive Customer data comprising of tags, webserver logs, social media and related data elements in its raw format and make it accessible for data mining and analytics
  • Demonstrated ability to communicate highly technical concepts in business terms and articulate value of adopting Big Data technologies.
  • Communicate with business stakeholders and explain technical concepts to a non-technical audience as needed
  • Hands-On Experience in architecture and implementation of large and highly complex projects
  • End to end experience in large scaled data lake/date warehouse projects
  • Hands-on experience in Apache Big Data Components/Frameworks
  • Create run book schedules for nightly jobs to meet business data load requirements.
  • Produce daily cluster reports and documents for senior team members.
  • Worked directly with departments, clients, management, and vendors to achieve project goals.
  • Assisted various business groups with document organization and dissemination during software deployment.
  • Supported Chief Operating Officer with daily operational functions.
  • Support Senior Data Architect with data consolidation efforts.
  • Support Senior IT Vice President with budget planning for Hadoop clusters
  • Attend weekly change review meetings to discuss cluster planned configuration changes to improve performance or correct defects.
  • Create and implement scripts to manage cluster logs.
  • Present cluster engineering plans to senior management
  • Involves designing, capacity arrangement, cluster set up, performance fine-tuning, monitoring, structure planning, scaling and administration.
  • Troubleshooting memory issues and utilization aspects of the clusters.
  • Administer and maintain platforms - such as Syncsort and AtScale.
  • HBase database administration
  • Perform ongoing implementations, administration, maintenance of the platforms.
  • Document and formalize processes for the environments.
  • Document processes for the administration of Hadoop platforms.
  • Complete understanding of all application in Hadoop ecosystem.
  • Install and maintain Apache NiFi clusters
  • Implement shell scripts to maintain logs rotation on several application levels and for different platform configurations.
  • Implement shell scripts to provide daily health reports and service monitoring on the cluster.
  • Install AtScale development software
  • Implement APIs to schedule AtScale relational cube builds
  • Engineer and develop run books for scheduled AtScale jobs execution.
  • Create shell script to access APIs to implement scheduled run of AtScale jobs.
  • Integrate with development and managements teams to meet project goals.
  • Collaborate with senior management to strategize cloud to on premise integration of Hadoop and Apache NiFi clusters.

Confidential

Sr Data Engineer, Senior Staff

Responsibilities:

  • Manage pre-prod and production clusters using Cloudera Manager on AWS to quickly identify failures and resource issues.
  • Contribute to the vision, deployment and administration of Hadoop enabled infrastructure as a core service to all business functions within the organization using AWS infrastructure.
  • Enable use access and manage corporate policies using AWS IAM.
  • Build data expertise and own data quality for ingestion pipelines.
  • Interface with engineers, product managers and product analysts to understand data needs.
  • Architect, build and launch new data models that provide intuitive analytics to end users.
  • Design, build and launch extremely efficient & reliable data pipelines to move data (both large and small amounts) to the Data Warehouses.
  • Design, build and launch new data extraction, transformation and loading processes in production.
  • Create new systems and tools to enable the customer to consume and understand data faster.
  • Apply coding skills across a number of languages from SQL, Scala, Python, and PySpark
  • Work across multiple teams in high visibility roles and own the solution end-to-end providing advice on the capabilities within AWS.
  • Support the administration of on premise and AWS cloud-based Hadoop clusters.
  • Make information available to large scale, next generation, predictive analytics applications.
  • Support and integrate big data tools/frameworks.
  • Work with, to build, implement and support the data infrastructure; ingest and transform data (ETL/ELT process).
  • Integrated Apache Kafka for data ingestion
  • Designed and implemented by configuring Topics in new Kafka cluster in all environments
  • Installed Kafka manager for consumer lags and for monitoring Kafka Metrics
  • Be available on call (on rotation) in a support role.
  • Manages and participates in the day to day operational work across the Hadoop clusters.
  • Works closely with the hosted operations colleagues to define operational best practice for the UAT and Prod Hadoop clusters.
  • Participates in project planning and review as they pertain to Hadoop and Hadoop clusters.
  • Main point of contact for vendor escalation on open issues related to the cluster maintenance, performance and upgrade.
  • Participate in interviewing candidates to fill open Hadoop development and administration.
  • Verified data integrity and accuracy.
  • Validated schematic designs working alongside Hadoop data engineers.
  • Work closely with off shore teams to meet project objectives and cluster maintenance support.

Confidential

Sr Hadoop Administrator

Responsibilities:

  • Collaborate with Think Big's Engineering, Data Science and, Customer teams to scope, size, install and configure components of the open source Hadoop Big Data platform on AWS, Teradata or RHEL environments.
  • Install peripheral tools such as HUE, RStudio and OpenLDAP.
  • Manage users on the Hadoop platform with enterprise Active Directory.
  • Encrypt data at rest and in motion.
  • Serve as an escalation point for the Think Big Managed Services teams that provide on-going operational support for Think Big's customers.
  • Collaborate with Think Big's project leads to develop project level scoping plan.
  • Research tools to accommodate customer requirements.
  • Develop test plans for initial Hadoop services component testing.
  • Develop POC application projects using Hive, HBase, Pig, PySpark, and Oozie.
  • Perform system administration tasks to include using Ambari to install and provision Hadoop clusters, onboard users to the Hadoop cluster, setup High Availability (HA) for key components in Hadoop such as the Namenode, Resource Manager, and any other component identified as being critical to the customer or use cases.
  • Work with the Think Big's operations practice leads to develop and ensure consistent deployment of best practices across all of Think Big's projects.
  • Design and develop tools to support proactive administration and monitoring of the open source Hadoop Big Data platform.
  • Support sales efforts scoping engagements and developing statements of work.
  • Apply tuning to components such as Hive, HBase and Spark to enhance performance.
  • Familiar with the use of Apache Ranger for user authorization, access, and for auditing on Hortonworks data platform and the use of Apache Knox for Hadoop perimeter gateway access.
  • Use Linux shell scripting where necessary to automate tasks or to fill gaps where tools cannot perform the tasks.
  • Familiar with Hadoop available streaming components such as Kafka, Spark, Storm and Flunk.
  • Integrate with client development team to develop applications to develop data pipelines based on specific use case scenarios.
  • Familiar with cloud computing environments such as AWS and Google cloud environments.
  • Collaborate with systems administrators and architects when necessary to perform system designs.
  • Familiar with Virtual Machine tools such as VMWare and VirtualBox.
  • Worked with clients on both Hortonworks and Cloudera Hadoop distributions.
  • Using both Ambari and Cloudera Manager to install and manage clusters.
  • Designed and implemented by configuring Topics in new Kafka cluster in all environment.
  • Installed Kafka manager for consumer lags and for monitoring Kafka Metrics also this has been used for adding topics, Partitions etc.
  • HBase database administration
  • GCP platform implementation for data analytics
  • AWS IAM and EMR deployment for Hadoop services and access
  • AWS Snowflake data warehouse migration from onsite to cloud environment
  • AWS S3 setup and configuration (buckets)

We'd love your feedback!