Hadoop/Big Data Engineer Resume Houston, TX - Hire IT People

PROFESSIONAL SUMMARY:

Over 10 years of professional IT experience including 5+ years of strong experience working on Apache Spark, BigData and Hadoop ecosystem. 5 years of strong end - to-end experience in Python, Java Programming involved in Design, development and implementing various web-based applications using Python and Java Technologies.
Hands on experience in developing and deploying enterprise-based applications using major components in Hadoop ecosystem like Hadoop 2.x, YARN, Hive, Pig, Spark, Map Reduce, Impala, Kafka, Oozie, HBase, Flume,Sqoop and Zookeeper.
Involved in all phases of Software Development Life Cycle (SDLC) and Worked on all activities related to the development, implementation, administration and support for Hadoop.
Experience in Python programming language for framework and core java concepts
Experience with on-prem (HortonWorks, MapR) and Google Cloud Platform.
Experience in monitoring, tuning and administrating Hadoop cluster.
Experience in understanding Big Data business requirements and providing them Hadoop based solutions.
Experience in Importing and exporting data from different databases like MySQL, Oracle, Teradata into HDFS using Sqoop.
Worked on Spark 1.6.0 for data processing using RDD’s and Dataframe API.
Experience in writing UDF'S in Hive for processing and analyzing large datasets.
Experience in working with different file formats and compression techniques in Hadoop.
Experience in using NFS (Network File Systems) for backing up Name node metadata.
Experience in managing the cluster resources by implementing fair scheduler and capacity scheduler.
Experience in developing Pig Latin scripts for data processing on HDFS.
Excellent team player with good communication skills and effective time management.
Understand business process management and business requirements of the customers and translate them to specific software requirements.
In-depth understanding of Spark Architecture including Spark Core, Spark SQL, Spark Streaming.
Experience in using Scala to convert Hive/SQL queries into RDD transformations in Spark.
Strong knowledge of real time data analytics using Spark Streaming, Kafka & amp; Flume.
Proficient knowledge with kafka and spark with YARN Local & Standalone modes.
Expertise in writing Spark RDD transformations, Actions, Case classes for input data and performing data transformations using Spark-Core
Implementing Scheduler using Azkaban, Tidal Enterprise scheduler, Crontab and Oozie.
Experience in using DStreams, Broadcast Variables, RDD caching for Spark Streaming.
Improving the performance and optimizing existing algorithms in Hadoop using Spark context, Spark-SQL,
DataFrames,Pair RDD’s & Spark YARN.
Hands on experience with ORC, AVRO, Sequence and Parquet file formats.
Experience in analyzing data using PIG Latin, HiveQL, Spark SQL
Experience with Hadoop Distributions like Cloudera and Hortonworks.
Extensive knowledge on designing Hive Managed/External tables, Views & Hive Analytical functions.
Experience in tuning the performance of hive queries using Partitioning and Bucketing.
Experience working with FLUME to handle large volume of streaming data ingestion.
Experience in developing customized UDFs and UDAFs to extend core functionality if PIG and Hive.
Experience in various Big Data application phases like Data Ingestion, Data analytics and Data visualization.
Proficient in working with NoSQL databases such as HBase and MongoDB.
Expertise in writing pig and hive queries for analyzing data to meet business requirements.
Experience in design and pipeline flows with Jenkins, Tonomi and Azkaban.
Exposed to build tools like MAVEN, SBT and bug tracking tool JIRA in the work environment.
Good Knowledge in scheduling Job/Workflow and monitoring tools like Azkaban and Confidential Tidal Scheduler.
Hands on Experience in Importing/Exporting Data from RDBS to HDFS using SQOOP.
Excellent programming skills at high level abstraction using Java, Scala, Python & SQL.
Co-ordinate patch upgrades, bug fixes and new releases for the application within stipulated timelines
Performing Team Lead Activities and Coordination with the team members and defining time estimations for deliverables of change requests, patches and upgrades to the application.

TECHNICAL SKILLS:

Hadoop Ecosystem Development: HDFS, Map: Reduce, Hive, Pig, TES, HBase, Sqoop, Zookeeper, Spark, MCS Azkaban, Ambari

Hadoop Distribution Framework: MapR, HortonWorks

Cloud Technologies: Google Cloud Platform

Languages: Java, C, C++ Technologies

Scripting: Shell Script, Perl, JavaScript and PowerShell

Hadoop Ingestion: Apache Sqoop, Apache Kafka, Apache spark, Apache Flume, Storm.

Database: Mongo Database, Oracle 10g, Oracle 11g, MySQL, Teradata, Hbase, Netezza.

Operating Systems: Linux, Unix, Windows

Development Tools: Eclipse, Putty, Tectia

Java Technologies: JSON,JDBC,AJAX

PROFESSIONAL EXPERIENCE:

Hadoop/Big Data Engineer

Confidential, Houston, TX

Responsibilities:

Developed Map Reduce jobs in java for data cleansing and preprocessing.
Moving data from DB2, Oracle Exadata to HDFS and vice-versa using SQOOP.
Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis
Worked with different file formats and compression techniques to determine standards
Developed hive queries and UDFS to analyze/transform the data in HDFS.
Developed hive scripts for implementing control tables logic in HDFS.
Designed and Implemented Partitioning (Static, Dynamic), Buckets in HIVE.
Developed Pig scripts and UDF’s as per the Business logic.
Developed user defined functions in pig using Python.
Analyzing/Transforming data with Hive and Pig.
Developed Oozie workflows and they are scheduled through a scheduler on a monthly basis.
Designed and developed read lock capability in HDFS.
Implemented Hadoop Float equivalent to the DB2 Decimal.
Involved in End to End implementation of ETL logic.
Effective coordination with offshore team and managed project deliverable on time.
Worked on QA support activities, test data creation and Unit testing activities.

Environment: CDH, Hadoop, HDFS, MapReduce, Hive, Sqoop, Pig, XML, ETL, DB2 and QA

Big Data Engineer

Confidential, San Jose, CA

Responsibilities:

Review Tidal Enterprise Scheduler jobs, verify if all the jobs are triggered at appropriate time and completed successfully
Investigate, RCA and fix the issue for any failed jobs.
Develop Scripts using Python for datalake framework
Assist project teams for their queries on DataLake tables,
Investigate/Fix if there are any data-mismatch issues identified or raised by the project teams
Coordinate with project teams for any datalake related changes and take initiatives like standardization of naming standards
Any adhoc table refresh requests and assist project teams for their testing, validating during GO Live
We co-ordinate and help to fix the issues with data catalogue, when it goes down
Assist project teams for the approval’s and access management related queries to the tables
If there are any PV’s to be loaded with huge data, we load the data using TPT or with max update date and ensure for its success
Coordinate with Teradata DBA team and Teradata support team for any TPT related queries/ issues
Worked on Scala 2.10 jobs using Spark 1.6.0for data processing using RDD’s and Dataframe API
Performance tuning of Spark and Sqoop Job
Coordinate with project teams for their ingestion requests
Review all the metadata provided by them, database credentials, Incremental columns and uniqueness of merge keys.
Discuss with Architect team for their approval to proceed with metadata creation
Create metadata and get It reviewed by Architect team
Load the tables, take the audits and share it with the teams.
Scheduling jobs in TES and validate for their success
Ingested data from AWS cloud to Hadoop datalake
Enterprise data and Analytics - In the process of automating Hadoop
Involved in requirement gathering, discussions with project teams, Joint Application Design meetings, Build/update user stories
Validate requirements, Review screen design prototypes, test the functionalities, participate in User Acceptance Testing and actively involved in GO live activities

Environment: CDH, Hadoop, HDFS, MapReduce, Hive, Sqoop, Pig, Jira, XML, DB2 and QA.

Big Data Engineer

Confidential, Milpitas, CA

Responsibilities:

As a center of Excellence team, Involve in any of the application issues, triage/investigate them, build and fix the issues
Using GCP Console, monitor dataproc cluster and jobs. Stack Driver to monitor Dashboards and do a performance tuning and optimization of jobs which are memory intensive and provide L3 support for the applications in production environment
DAS - Data as a Service, in GCP - google storage buckets.
Monitor Azkaban jobs in on-prem (Hortonworks distribution) and GCP (Google Cloud Platform).
Investigate, RCA and fix the issue for any failed jobs.
As part of Production Engineering team, keep the environment healthy 24/7.
Monitor Azkaban, Ambari, splunk and Net Diagnostics for Hbase Timeouts, Propensities.
GCP - Stackdriver for monitoring, logging, compute engine and dataproc.
Involve in discussion with project teams, understand the issue, Investigate, RCA and fix the issue
Handle production issues assigned in JIRA tasks, Incidents / Requests through Remedy.
Involve in discussions, take initiatives and work with Application teams for the smooth flow of projects.
Involve in all the Technical Discussions and scrum meetings.
Reviews project documents) received from other Technical Specialists, Business Technical Specialists, and Project Managers to ensure quality, completeness, and adherence to documentation standard.

Environment: Hadoop,Java, HDFS, Jira,Azkaban, MapReduce, Hive, Sqoop, Pig, XML, ETL, DB2 and QA

Java/Hadoop Developer

Confidential, San Diego, CA

Responsibilities:

Installation & configuration of a Hadoop cluster along with Hive.
Developed Map Reduce application using Hadoop Map reduce programming, a framework for processing.
Large data sets in parallel across the Hadoop cluster for pre-processing.
Developed the code for Importing and exporting data into HDFS and Hive using Sqoop.
Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
Responsible for designing Front end system using HTML, CSS, JSP, Servlets and Ajax.
Transformed web application into compatible Mobile & Tablet application by designing responsive designs using
HTML & CSS.
Used LDAP for user Authentication and authorization.
Created Stored Procedures, Views, Cursors and functions to support application.
Involved in defining job flows using Oozie for scheduling jobs to manage apache Hadoop jobs by directed.
Acyclic graph (DAG) of actions with control flows.
Developing Hive User Defined Functions in java, compiling them into jars and adding them to the HDFS and executing them with Hive Queries.
Experienced in managing and reviewing Hadoop log files.
Responsible to manage data coming from different sources.
Assisted in monitoring the Hadoop cluster using Ganglia tool.
Dealing with high volume of data in the cluster.
Tested and reported defects in an Agile Methodology perspective.
Consolidate all defects, report it to PM/Leads for prompt fixes by development teams and drive it to closure.
Installed and configured Hadoop Cluster (3 Node Cluster) in fully distributed mode.
Installed hadoop ecosystems(Hive, Pig, Sqoop, HBase, Oozie) on top of hadoop cluster
Importing data from Oracle to HDFS & Hive for analytical purpose.
Analyzing imported data in HDFS & Hive using HiveQL and custom Map Reduce programs in Java

Environment: Java, CDH, Hadoop, HDFS, Map Reduce, Hive and Sqoop

Confidential

JAVA Developer

Responsibilities:

Responsible and active in the analysis, design, implementation and deployment of full software development lifecycle (SDLC) of the project.
Designed and developed user interface using JSP, HTML and JavaScript.
Developed JSP Custom Tag Libraries for Tree Structure and Grid using Pagination Logics.
Worked extensively with JSP's and Servlets to accommodate all presentation customizations on the front end.Used Building tools like Maven to build, package, test and deploy application in the application server.
Developed Struts action classes, action forms and performed action mapping using Struts framework and
Performed data validation in form beans and action classes.
Extensively used Struts framework as the controller to handle subsequent client requests and invoke the
Model based upon user requests.
Defined the search criteria and pulled out the record of the customer from the database.
Make the required changes and save the updated record back to the database.
Validated the fields of user registration screen and login screen by writing JavaScript validations.
Involved in developing and coding the Interfaces and classes required for the application and created appropriate relationships between the system classes and the interfaces provided.
Developed build and deployed scripts using Apache ANT to customize WAR and EAR files.
Used DAO and JDBC for database access.
Developed stored procedures and triggers using PL/SQL to calculate and update the tables to implement business logic.
Used SVN to maintain source and version management.
Using JIRA to manage the issues/project work flow.
Involved in peer code reviews and performed integration testing of the modules. Followed coding and documentation standards.
Involved in post-production support and maintenance of the application.

Environment: Oracle, Java, Struts, Servlets, HTML, XML, SQL, J2EE, JUnit, Tomcat, PL/SQL, JIRA, SVN.

We provide IT Staff Augmentation Services!

Hadoop/big Data Engineer Resume

Houston, TX

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship