Hadoop/big Data Engineer Resume
Houston, TX
PROFESSIONAL SUMMARY:
- Over 10 years of professional IT experience including 5+ years of strong experience working on Apache Spark, BigData and Hadoop ecosystem. 5 years of strong end - to-end experience in Python, Java Programming involved in Design, development and implementing various web-based applications using Python and Java Technologies.
- Hands on experience in developing and deploying enterprise-based applications using major components in Hadoop ecosystem like Hadoop 2.x, YARN, Hive, Pig, Spark, Map Reduce, Impala, Kafka, Oozie, HBase, Flume,Sqoop and Zookeeper.
- Involved in all phases of Software Development Life Cycle (SDLC) and Worked on all activities related to the development, implementation, administration and support for Hadoop.
- Experience in Python programming language for framework and core java concepts
- Experience with on-prem (HortonWorks, MapR) and Google Cloud Platform.
- Experience in monitoring, tuning and administrating Hadoop cluster.
- Experience in understanding Big Data business requirements and providing them Hadoop based solutions.
- Experience in Importing and exporting data from different databases like MySQL, Oracle, Teradata into HDFS using Sqoop.
- Worked on Spark 1.6.0 for data processing using RDD’s and Dataframe API.
- Experience in writing UDF'S in Hive for processing and analyzing large datasets.
- Experience in working with different file formats and compression techniques in Hadoop.
- Experience in using NFS (Network File Systems) for backing up Name node metadata.
- Experience in managing the cluster resources by implementing fair scheduler and capacity scheduler.
- Experience in developing Pig Latin scripts for data processing on HDFS.
- Excellent team player with good communication skills and effective time management.
- Understand business process management and business requirements of the customers and translate them to specific software requirements.
- In-depth understanding of Spark Architecture including Spark Core, Spark SQL, Spark Streaming.
- Experience in using Scala to convert Hive/SQL queries into RDD transformations in Spark.
- Strong knowledge of real time data analytics using Spark Streaming, Kafka & amp; Flume.
- Proficient knowledge with kafka and spark with YARN Local & Standalone modes.
- Expertise in writing Spark RDD transformations, Actions, Case classes for input data and performing data transformations using Spark-Core
- Implementing Scheduler using Azkaban, Tidal Enterprise scheduler, Crontab and Oozie.
- Experience in using DStreams, Broadcast Variables, RDD caching for Spark Streaming.
- Improving the performance and optimizing existing algorithms in Hadoop using Spark context, Spark-SQL,
- DataFrames,Pair RDD’s & Spark YARN.
- Hands on experience with ORC, AVRO, Sequence and Parquet file formats.
- Experience in analyzing data using PIG Latin, HiveQL, Spark SQL
- Experience with Hadoop Distributions like Cloudera and Hortonworks.
- Extensive knowledge on designing Hive Managed/External tables, Views & Hive Analytical functions.
- Experience in tuning the performance of hive queries using Partitioning and Bucketing.
- Experience working with FLUME to handle large volume of streaming data ingestion.
- Experience in developing customized UDFs and UDAFs to extend core functionality if PIG and Hive.
- Experience in various Big Data application phases like Data Ingestion, Data analytics and Data visualization.
- Proficient in working with NoSQL databases such as HBase and MongoDB.
- Expertise in writing pig and hive queries for analyzing data to meet business requirements.
- Experience in design and pipeline flows with Jenkins, Tonomi and Azkaban.
- Exposed to build tools like MAVEN, SBT and bug tracking tool JIRA in the work environment.
- Good Knowledge in scheduling Job/Workflow and monitoring tools like Azkaban and Confidential Tidal Scheduler.
- Hands on Experience in Importing/Exporting Data from RDBS to HDFS using SQOOP.
- Excellent programming skills at high level abstraction using Java, Scala, Python & SQL.
- Co-ordinate patch upgrades, bug fixes and new releases for the application within stipulated timelines
- Performing Team Lead Activities and Coordination with the team members and defining time estimations for deliverables of change requests, patches and upgrades to the application.
TECHNICAL SKILLS:
Hadoop Ecosystem Development: HDFS, Map: Reduce, Hive, Pig, TES, HBase, Sqoop, Zookeeper, Spark, MCS Azkaban, Ambari
Hadoop Distribution Framework: MapR, HortonWorks
Cloud Technologies: Google Cloud Platform
Languages: Java, C, C++ Technologies
Scripting: Shell Script, Perl, JavaScript and PowerShell
Hadoop Ingestion: Apache Sqoop, Apache Kafka, Apache spark, Apache Flume, Storm.
Database: Mongo Database, Oracle 10g, Oracle 11g, MySQL, Teradata, Hbase, Netezza.
Operating Systems: Linux, Unix, Windows
Development Tools: Eclipse, Putty, Tectia
Java Technologies: JSON,JDBC,AJAX
PROFESSIONAL EXPERIENCE:
Hadoop/Big Data Engineer
Confidential, Houston, TX
Responsibilities:
- Developed Map Reduce jobs in java for data cleansing and preprocessing.
- Moving data from DB2, Oracle Exadata to HDFS and vice-versa using SQOOP.
- Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis
- Worked with different file formats and compression techniques to determine standards
- Developed hive queries and UDFS to analyze/transform the data in HDFS.
- Developed hive scripts for implementing control tables logic in HDFS.
- Designed and Implemented Partitioning (Static, Dynamic), Buckets in HIVE.
- Developed Pig scripts and UDF’s as per the Business logic.
- Developed user defined functions in pig using Python.
- Analyzing/Transforming data with Hive and Pig.
- Developed Oozie workflows and they are scheduled through a scheduler on a monthly basis.
- Designed and developed read lock capability in HDFS.
- Implemented Hadoop Float equivalent to the DB2 Decimal.
- Involved in End to End implementation of ETL logic.
- Effective coordination with offshore team and managed project deliverable on time.
- Worked on QA support activities, test data creation and Unit testing activities.
Environment: CDH, Hadoop, HDFS, MapReduce, Hive, Sqoop, Pig, XML, ETL, DB2 and QA
Big Data Engineer
Confidential, San Jose, CA
Responsibilities:
- Review Tidal Enterprise Scheduler jobs, verify if all the jobs are triggered at appropriate time and completed successfully
- Investigate, RCA and fix the issue for any failed jobs.
- Develop Scripts using Python for datalake framework
- Assist project teams for their queries on DataLake tables,
- Investigate/Fix if there are any data-mismatch issues identified or raised by the project teams
- Coordinate with project teams for any datalake related changes and take initiatives like standardization of naming standards
- Any adhoc table refresh requests and assist project teams for their testing, validating during GO Live
- We co-ordinate and help to fix the issues with data catalogue, when it goes down
- Assist project teams for the approval’s and access management related queries to the tables
- If there are any PV’s to be loaded with huge data, we load the data using TPT or with max update date and ensure for its success
- Coordinate with Teradata DBA team and Teradata support team for any TPT related queries/ issues
- Worked on Scala 2.10 jobs using Spark 1.6.0for data processing using RDD’s and Dataframe API
- Performance tuning of Spark and Sqoop Job
- Coordinate with project teams for their ingestion requests
- Review all the metadata provided by them, database credentials, Incremental columns and uniqueness of merge keys.
- Discuss with Architect team for their approval to proceed with metadata creation
- Create metadata and get It reviewed by Architect team
- Load the tables, take the audits and share it with the teams.
- Scheduling jobs in TES and validate for their success
- Ingested data from AWS cloud to Hadoop datalake
- Enterprise data and Analytics - In the process of automating Hadoop
- Involved in requirement gathering, discussions with project teams, Joint Application Design meetings, Build/update user stories
- Validate requirements, Review screen design prototypes, test the functionalities, participate in User Acceptance Testing and actively involved in GO live activities
Environment: CDH, Hadoop, HDFS, MapReduce, Hive, Sqoop, Pig, Jira, XML, DB2 and QA.
Big Data Engineer
Confidential, Milpitas, CA
Responsibilities:
- As a center of Excellence team, Involve in any of the application issues, triage/investigate them, build and fix the issues
- Using GCP Console, monitor dataproc cluster and jobs. Stack Driver to monitor Dashboards and do a performance tuning and optimization of jobs which are memory intensive and provide L3 support for the applications in production environment
- DAS - Data as a Service, in GCP - google storage buckets.
- Monitor Azkaban jobs in on-prem (Hortonworks distribution) and GCP (Google Cloud Platform).
- Investigate, RCA and fix the issue for any failed jobs.
- As part of Production Engineering team, keep the environment healthy 24/7.
- Monitor Azkaban, Ambari, splunk and Net Diagnostics for Hbase Timeouts, Propensities.
- GCP - Stackdriver for monitoring, logging, compute engine and dataproc.
- Involve in discussion with project teams, understand the issue, Investigate, RCA and fix the issue
- Handle production issues assigned in JIRA tasks, Incidents / Requests through Remedy.
- Involve in discussions, take initiatives and work with Application teams for the smooth flow of projects.
- Involve in all the Technical Discussions and scrum meetings.
- Reviews project documents) received from other Technical Specialists, Business Technical Specialists, and Project Managers to ensure quality, completeness, and adherence to documentation standard.
Environment: Hadoop,Java, HDFS, Jira,Azkaban, MapReduce, Hive, Sqoop, Pig, XML, ETL, DB2 and QA
Java/Hadoop Developer
Confidential, San Diego, CA
Responsibilities:
- Installation & configuration of a Hadoop cluster along with Hive.
- Developed Map Reduce application using Hadoop Map reduce programming, a framework for processing.
- Large data sets in parallel across the Hadoop cluster for pre-processing.
- Developed the code for Importing and exporting data into HDFS and Hive using Sqoop.
- Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
- Responsible for designing Front end system using HTML, CSS, JSP, Servlets and Ajax.
- Transformed web application into compatible Mobile & Tablet application by designing responsive designs using
- HTML & CSS.
- Used LDAP for user Authentication and authorization.
- Created Stored Procedures, Views, Cursors and functions to support application.
- Involved in defining job flows using Oozie for scheduling jobs to manage apache Hadoop jobs by directed.
- Acyclic graph (DAG) of actions with control flows.
- Developing Hive User Defined Functions in java, compiling them into jars and adding them to the HDFS and executing them with Hive Queries.
- Experienced in managing and reviewing Hadoop log files.
- Responsible to manage data coming from different sources.
- Assisted in monitoring the Hadoop cluster using Ganglia tool.
- Dealing with high volume of data in the cluster.
- Tested and reported defects in an Agile Methodology perspective.
- Consolidate all defects, report it to PM/Leads for prompt fixes by development teams and drive it to closure.
- Installed and configured Hadoop Cluster (3 Node Cluster) in fully distributed mode.
- Installed hadoop ecosystems(Hive, Pig, Sqoop, HBase, Oozie) on top of hadoop cluster
- Importing data from Oracle to HDFS & Hive for analytical purpose.
- Analyzing imported data in HDFS & Hive using HiveQL and custom Map Reduce programs in Java
Environment: Java, CDH, Hadoop, HDFS, Map Reduce, Hive and Sqoop
Confidential
JAVA Developer
Responsibilities:
- Responsible and active in the analysis, design, implementation and deployment of full software development lifecycle (SDLC) of the project.
- Designed and developed user interface using JSP, HTML and JavaScript.
- Developed JSP Custom Tag Libraries for Tree Structure and Grid using Pagination Logics.
- Worked extensively with JSP's and Servlets to accommodate all presentation customizations on the front end.Used Building tools like Maven to build, package, test and deploy application in the application server.
- Developed Struts action classes, action forms and performed action mapping using Struts framework and
- Performed data validation in form beans and action classes.
- Extensively used Struts framework as the controller to handle subsequent client requests and invoke the
- Model based upon user requests.
- Defined the search criteria and pulled out the record of the customer from the database.
- Make the required changes and save the updated record back to the database.
- Validated the fields of user registration screen and login screen by writing JavaScript validations.
- Involved in developing and coding the Interfaces and classes required for the application and created appropriate relationships between the system classes and the interfaces provided.
- Developed build and deployed scripts using Apache ANT to customize WAR and EAR files.
- Used DAO and JDBC for database access.
- Developed stored procedures and triggers using PL/SQL to calculate and update the tables to implement business logic.
- Used SVN to maintain source and version management.
- Using JIRA to manage the issues/project work flow.
- Involved in peer code reviews and performed integration testing of the modules. Followed coding and documentation standards.
- Involved in post-production support and maintenance of the application.
Environment: Oracle, Java, Struts, Servlets, HTML, XML, SQL, J2EE, JUnit, Tomcat, PL/SQL, JIRA, SVN.
