We provide IT Staff Augmentation Services!

Big Data Engineer Lead Resume

4.00/5 (Submit Your Rating)

SUMMARY

  • Total IT experience of 14 years in Analysis, Design, Development, Implementation, Support and Testing of software applications which includes 4 years of experience in Big Data/ Hadoop using HDFS, Hive, Hue, Spark, Kafka, Flume, Pig, Sqoop, HBase, YARN, Oozie, Python, Hortonworks, Cloudera, Unix Shell scripting, Map Reduce, AWS, EC2, Google Cloud, Core Java and Agile .
  • Working as Senior Big Data Engineer using CoreJava, HiveQL, Sqoop, MapReduce, Scala, SparkSQL, PySpark
  • Hands on experience on ETL to get data ingestions from RDBMS to Hive and HDFS using SQOOP from SQL Server , MYSQL, DB2.
  • Working in PySpark and especially in Spark Session, Pandas and Generators in Python.
  • Having good exposure in ORC, AVRO and PARQUET file formats in Hive QL.
  • Working on Hive QL, Spark QL and extensive knowledge on SQL Server Queries for complex datasets.
  • Hands on experience in XML Parsing, JSON Parsing and CSV files parsing using Pythons scripts.
  • Having good exposure with GitHub repository and CI, CD tool of JENKINS
  • Involved in creating tables, tuning, partitioning, bucketing of table and creating QL in Hive.
  • Working on Job Automation tool TIDAL, JIIRA tickets, Service Now, Secure CRT, Secure FX, WinSCP
  • Handling 1000 of jobs per day and doing data ingestion and track its status thru Automation Tool TIDAL
  • Working on JSON scripts generation and writing UNIX shell scripting to call the SQOOP Import/Export
  • Working with product owners to review the requirement, analyze the business logic, and translate that to the details tasks of the particular user stories in Rally
  • Working closely with team members to implement the requirement captured in the Rally user stories
  • Working in relational SQL and NoSQL databases, including Oracle, Hive, Sqoop and HBase
  • Monitor jobs, queues, and HDFS capacity using Zookeeper and vendor - specific front-end cluster management tools
  • Worked on Capacity Scheduler Yarn, commission & decommission nodes on cluster
  • Hands on experience in Security setup for Ranger, Kerberos, LDAP and link with Active Directory
  • Onboarding users to use Hadoop - configuration, access control, disk quota, permissions etc.
  • Good Exposure in Hadoop Framework and Big Data concepts and Cluster configuration
  • Configured and working with 100 nodes Hadoop cluster. Installation and Configuration of Hadoop, HBase, Hive,Pig,Sqoop and Flume using Ambari UI.
  • Experience using Service Now Tool and BMC Remedy to handle incidents, tickets and Change creation.
  • Excellent understanding of Hadoop architecture and different components of Hadoop clusters which include Job Tracker, Task Tracker, Name Node and Data Node.
  • Strong experience in collecting and storing stream data like log data in HDFS using Apache Flume.
  • Hands on experience in Dot Net, Java, JSP, SQL Server, MYSQL, Access, VBA Macros, Oracle PLSQL, Crystal Reports.

TECHNICAL SKILLS

Operating Systems: Windows, Linux,Ubuntu, Putty, SecureCRT

Languages/Technology: Big Data, Hadoop, HDFS, PySpark, Python, Spark, Map Reduce Hive, Hue, PigSqoop, Flume, Kafka, Storm and Spark. Core Java, Python, DotNet, VBA, JSPRestful API, Javascript, ASP.NET, VB.NET, C#, VB6, Classic ASP, VBA Macros

Java IDEs: Eclipse

Databases: HBase, MySQL 5.0, Oracle 10g, SQL Server

Development and Build Tools: Eclipse

Scripting Tools: Shell Scripting, JavaScript, VBScript

Cloud Technologies: AWS,S3,EC2, Google Cloud

PROFESSIONAL EXPERIENCE

Confidential

Big Data Engineer Lead

Responsibilities:

  • Working on Data Ingestion requests from many sources DB2, Oracle, SQL Server and MYSQL servers
  • Handling 1000 of jobs per day and doing data ingestion and track its status thru Automation Tool TIDAL
  • Working on JSON scripts generation and writing UNIX shell scripting to call the SQOOP Import/Export
  • Working on Python Script for XML Parsing, JSON Parsing of data based on requirements.
  • Ingesting tables from many RDBS data to Hive and HDFS
  • Handling queries in SparkSQL, HiveQL using Spark Session.
  • Using AGILE methodology to get the work and deliver the output Stories and Sprint basis.
  • Working on Python script and PySpark script based on user requirements
  • Provided support to users for diagnosing, reproducing and fixing Hadoop related issues.
  • Using JIIRA, Service Now for ticketing, incidents and Change Request creation
  • Created Unix script for Smoke Test Script to Cluster health check.

Confidential

BigData Developer

Responsibilities:

  • Involved in writing Map Reduce, Hive queries and Pig scripts and resolve Hive and HBase
  • Involved in performance tuning of Hive, Writing Hive QL and Pig Scripts based on requirements.
  • Involved in resolution of access issues, performance issues and Patch/upgrade related issues.
  • Involved in Cluster configuration and monitoring its activities through Ambari UI and Unix CLI.
  • Perform and support change tasks related to the Big Data / Hadoop environments covering changes to entire Hadoop Ecosystem (e.g. hardware, OS, cluster, Hive or Hbase)
  • Validation and/or Turn down services before/after any infrastructure (Network, Hardware, OS) changes.
  • Involved in Performance implementation in Hive Query, Table maintenance, HBase activities
  • Provided support to users for diagnosing, reproducing and fixing Hadoop related issues.
  • Using BMC Remedy for ticketing, incidents and Change Request creation
  • Created Smoke Test Script to Cluster health check.

Confidential

Hadoop Developer

Responsibilities:

  • Working with product owners to review the requirement, analyze the business logic, and translate that to the details tasks of the particular user stories in Rally
  • Participate daily scrum meetings to collaborate with other scrum team members
  • Working closely with team members to implement the requirement captured in the Rally user stories
  • Working in relational SQL and NoSQL databases, including Oracle, Hive, Sqoop and HBase.
  • Involved in performance tuning of Hive, Writing Hive UDF's and Pig UDF's based on requirements.
  • Involved in Importing and exporting data from HDFS using Sqoop, resolution of access issues, performance issues and Patch/upgrade related issues.
  • Involved in Performance implementation in Hive Query, Table maintenance, HBase activities.
  • Preparing and reviewing of run book preparation for change request and use case document.

We'd love your feedback!