Big Data Engineer Lead Resume
4.00/5 (Submit Your Rating)
SUMMARY
- Total IT experience of 14 years in Analysis, Design, Development, Implementation, Support and Testing of software applications which includes 4 years of experience in Big Data/ Hadoop using HDFS, Hive, Hue, Spark, Kafka, Flume, Pig, Sqoop, HBase, YARN, Oozie, Python, Hortonworks, Cloudera, Unix Shell scripting, Map Reduce, AWS, EC2, Google Cloud, Core Java and Agile .
- Working as Senior Big Data Engineer using CoreJava, HiveQL, Sqoop, MapReduce, Scala, SparkSQL, PySpark
- Hands on experience on ETL to get data ingestions from RDBMS to Hive and HDFS using SQOOP from SQL Server , MYSQL, DB2.
- Working in PySpark and especially in Spark Session, Pandas and Generators in Python.
- Having good exposure in ORC, AVRO and PARQUET file formats in Hive QL.
- Working on Hive QL, Spark QL and extensive knowledge on SQL Server Queries for complex datasets.
- Hands on experience in XML Parsing, JSON Parsing and CSV files parsing using Pythons scripts.
- Having good exposure with GitHub repository and CI, CD tool of JENKINS
- Involved in creating tables, tuning, partitioning, bucketing of table and creating QL in Hive.
- Working on Job Automation tool TIDAL, JIIRA tickets, Service Now, Secure CRT, Secure FX, WinSCP
- Handling 1000 of jobs per day and doing data ingestion and track its status thru Automation Tool TIDAL
- Working on JSON scripts generation and writing UNIX shell scripting to call the SQOOP Import/Export
- Working with product owners to review the requirement, analyze the business logic, and translate that to the details tasks of the particular user stories in Rally
- Working closely with team members to implement the requirement captured in the Rally user stories
- Working in relational SQL and NoSQL databases, including Oracle, Hive, Sqoop and HBase
- Monitor jobs, queues, and HDFS capacity using Zookeeper and vendor - specific front-end cluster management tools
- Worked on Capacity Scheduler Yarn, commission & decommission nodes on cluster
- Hands on experience in Security setup for Ranger, Kerberos, LDAP and link with Active Directory
- Onboarding users to use Hadoop - configuration, access control, disk quota, permissions etc.
- Good Exposure in Hadoop Framework and Big Data concepts and Cluster configuration
- Configured and working with 100 nodes Hadoop cluster. Installation and Configuration of Hadoop, HBase, Hive,Pig,Sqoop and Flume using Ambari UI.
- Experience using Service Now Tool and BMC Remedy to handle incidents, tickets and Change creation.
- Excellent understanding of Hadoop architecture and different components of Hadoop clusters which include Job Tracker, Task Tracker, Name Node and Data Node.
- Strong experience in collecting and storing stream data like log data in HDFS using Apache Flume.
- Hands on experience in Dot Net, Java, JSP, SQL Server, MYSQL, Access, VBA Macros, Oracle PLSQL, Crystal Reports.
TECHNICAL SKILLS
Operating Systems: Windows, Linux,Ubuntu, Putty, SecureCRT
Languages/Technology: Big Data, Hadoop, HDFS, PySpark, Python, Spark, Map Reduce Hive, Hue, PigSqoop, Flume, Kafka, Storm and Spark. Core Java, Python, DotNet, VBA, JSPRestful API, Javascript, ASP.NET, VB.NET, C#, VB6, Classic ASP, VBA Macros
Java IDEs: Eclipse
Databases: HBase, MySQL 5.0, Oracle 10g, SQL Server
Development and Build Tools: Eclipse
Scripting Tools: Shell Scripting, JavaScript, VBScript
Cloud Technologies: AWS,S3,EC2, Google Cloud
PROFESSIONAL EXPERIENCE
Confidential
Big Data Engineer Lead
Responsibilities:
- Working on Data Ingestion requests from many sources DB2, Oracle, SQL Server and MYSQL servers
- Handling 1000 of jobs per day and doing data ingestion and track its status thru Automation Tool TIDAL
- Working on JSON scripts generation and writing UNIX shell scripting to call the SQOOP Import/Export
- Working on Python Script for XML Parsing, JSON Parsing of data based on requirements.
- Ingesting tables from many RDBS data to Hive and HDFS
- Handling queries in SparkSQL, HiveQL using Spark Session.
- Using AGILE methodology to get the work and deliver the output Stories and Sprint basis.
- Working on Python script and PySpark script based on user requirements
- Provided support to users for diagnosing, reproducing and fixing Hadoop related issues.
- Using JIIRA, Service Now for ticketing, incidents and Change Request creation
- Created Unix script for Smoke Test Script to Cluster health check.
Confidential
BigData Developer
Responsibilities:
- Involved in writing Map Reduce, Hive queries and Pig scripts and resolve Hive and HBase
- Involved in performance tuning of Hive, Writing Hive QL and Pig Scripts based on requirements.
- Involved in resolution of access issues, performance issues and Patch/upgrade related issues.
- Involved in Cluster configuration and monitoring its activities through Ambari UI and Unix CLI.
- Perform and support change tasks related to the Big Data / Hadoop environments covering changes to entire Hadoop Ecosystem (e.g. hardware, OS, cluster, Hive or Hbase)
- Validation and/or Turn down services before/after any infrastructure (Network, Hardware, OS) changes.
- Involved in Performance implementation in Hive Query, Table maintenance, HBase activities
- Provided support to users for diagnosing, reproducing and fixing Hadoop related issues.
- Using BMC Remedy for ticketing, incidents and Change Request creation
- Created Smoke Test Script to Cluster health check.
Confidential
Hadoop Developer
Responsibilities:
- Working with product owners to review the requirement, analyze the business logic, and translate that to the details tasks of the particular user stories in Rally
- Participate daily scrum meetings to collaborate with other scrum team members
- Working closely with team members to implement the requirement captured in the Rally user stories
- Working in relational SQL and NoSQL databases, including Oracle, Hive, Sqoop and HBase.
- Involved in performance tuning of Hive, Writing Hive UDF's and Pig UDF's based on requirements.
- Involved in Importing and exporting data from HDFS using Sqoop, resolution of access issues, performance issues and Patch/upgrade related issues.
- Involved in Performance implementation in Hive Query, Table maintenance, HBase activities.
- Preparing and reviewing of run book preparation for change request and use case document.