We provide IT Staff Augmentation Services!

Sr. Big Data Engineer/hadoop Developer Resume

4.00/5 (Submit Your Rating)

Scottsdale, AZ

PROFESSIONAL SUMMARY

  • Have 8+ years of experience in Analysis, Design, Development, Testing, Implementation, Maintenance and Enhancements on various IT Projects experience in Big Data in implementing end - to-end Hadoop solutions.
  • Experience in working in environments using Agile (SCRUM), RUP and Test-Driven development methodologies.
  • In depth knowledge in using Hadoop ecosystem tools like MapReduce, HDFS, Pig, Hive, Kafka, Yarn, Sqoop, Spark, Oozie and Zookeeper.
  • Excellent understanding and extensive knowledge of Hadoop architecture and various ecosystem components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce programming paradigm.
  • Strong hands-on experience of Apache Hadoop along enterprise version of Cloudera and Hortonworks.
  • Proficient in Data modelling use case design and Object-oriented concepts.
  • Well versed in installation, configuration, supporting and managing of Big Data and underlying infrastructure of Hadoop Cluster.
  • Expertise on spark components like Spark SQL, Mallis, Spark Streaming and GraphX.
  • Extensively worked on Spark streaming and Apache Kafka to fetch live stream data.
  • Expertise in converting Hive/SQL queries into RDD transformations using Apache Spark and Python.
  • Implemented Dynamic Partitions and Buckets in HIVE for efficient data access.
  • Experience in data processing like collecting, aggregating, moving from various sources using Apache Flume and Kafka.
  • Involved in integrating hive queries into spark environment using Spark Sql.
  • Hands on experience in performing real time analytics on big data using HBase and Cassandra in Kubernetes & Hadoop clusters.
  • Experience in using Flume to stream data into HDFS.
  • Skillful using Sqoop to import data into HDFS from RDBMS and vice-versa.
  • Played an active role in developing data pipeline using Flume, Sqoop, and Pig to extract the data from weblogs and store in HDFS.
  • Good knowledge in using job scheduling and monitoring tools like Oozie and Zookeeper.
  • Good understanding of NoSQL databases and hands on work experience in writing applications on NoSQL databases like HBase, Cassandra and MongoDB.
  • Good working experience on different file formats (PARQUET, TEXTFILE, AVRO, ORC) and different compression codecs (GZIP, SNAPPY, LZO)
  • Experience in installation & configuration of Apache Hadoop on Amazon AWS (EC2) system.
  • Worked extensively in configuring Auto scaling for high Availability.
  • Proficient in Python Programming and performed data processing of huge datasets in Hadoop cluster using PySpark.
  • Experience in using Informatica Power centre as an ETL tool.
  • Good experience in Teradata RDBMS using Fast load, Multiload, Tpump, Fastexport, Multiload, Teradata SQL Assistant and BTEQ utilities.
  • Created numerous dashboards using Tableau, PowerBI to visualize and forecast the huge volume of aggregated and analyzed data.
  • Sound knowledge of using version control tools like GIT, VSS, SVN and PVCS.
  • Experience developing Servlets and JSPs in an Apache/Tomcat environment
  • Experience with Scala using frameworks such as Scalatra, Specs2.
  • Adept in SQL and PL/SQL to write Stored Procedures and Functions and writing unit test cases using JUnit.
  • Experience working with JAVA J2EE, JDBC, ODBC, JSP, Java Eclipse, Java Beans, EJB, Servlets.
  • Experienced in developing web services with XML based protocols such as SOAP, REST, and Axis.
  • Familiarity with build tools like Ant, Maven, SBT, and Gradle to build and deploy applications into server.
  • Very good experience of writing Technical Design document, Development, Testing and Implementation of Enterprise level Data mart and Data warehouses.
  • Highly skilled in mathematics along with problem solving and analytical skills.

PROFESSIONAL EXPERIENCE

Sr. Big Data Engineer/Hadoop Developer

Confidential,Scottsdale, AZ

Responsibilities:

  • Configured Spark Streaming to receive real time data from the Apache Kafka and store the stream data to HDFS using PySpark.
  • Processed data into HDFS by developing solutions and analyzed the data using Map Reduce, PIG, and Hive to produce summary results from Hadoop to downstream systems.
  • Written Map Reduce code to process and parsing the data from various sources and storing parsed data into HBase and Hive using HBase-Hive Integration.
  • Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analysed them by running Hive queries and Pig scripts.
  • Created Managed tables and External tables in Hive and loaded data from HDFS.
  • Developed Spark code by using Python and Spark-SQL for faster processing and testing and performed complex HiveQL queries on Hive tables.
  • Scheduled several times based Oozie workflow by developing Python scripts.
  • Developed Pig Latin scripts using operators such as LOAD, STORE, DUMP, FILTER, DISTINCT, FOREACH, GENERATE, GROUP, COGROUP, ORDER, LIMIT, UNION, SPLIT to extract data from data files to load into HDFS.
  • Exporting the data using Sqoop to RDBMS servers and processed that data for ETL operations.
  • Designing ETL Data Pipeline flow to ingest the data from RDBMS source to Hadoop using shell script, Sqoop, package and MySQL.
  • Optimized the Hive tables using optimization techniques like partitions and bucketing to provide better.
  • Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from Oracle into HDFS using Sqoop
  • Implemented POC to migrate Map Reduce jobs into Spark RDD transformations using Python.
  • Scheduled map reduces jobs in production environment using Oozie scheduler.
  • Involved in Cluster maintenance, Cluster Monitoring and Troubleshooting, Manage and review data backups and log files.
  • Designed and implemented map reduce jobs to support distributed processing using java, Hive and Apache Pig.
  • Analysing Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase and Sqoop.
  • Improved the Performance by tuning of HIVE and map reduce.

Environment: HDFS, Map Reduce Hive, Sqoop, Pig, Flume, Vertica, Oozie Scheduler, Java, Shell Scripts, Teradata, Oracle, HBase, Cassandra, Cloudera, JavaScript, JSP, Kafka, Spark, Scala and ETL, Python.

Big Data Engineer/Hadoop Developer

Confidential, Bothell, WA

Responsibilities:

  • Involved in loading and transforming large sets of structured, semi-structured and unstructured data from relational databases into HDFS using Sqoop imports.
  • Ingested the data from source system into HDFS flat file system using Linux shell scripting.
  • Developed Sqoop scripts to import export data from relational sources and handled incremental loading on the customer, transaction data by date.
  • Migrated existing java application into microservices using spring boot and spring cloud.
  • Working knowledge in different IDEs like Eclipse, Spring Tool Suite.
  • Working knowledge of using GIT, ANT/Maven for project dependency / build / deployment.
  • Developed simple and complex MapReduce programs in Java for Data Analysis on different data formats.
  • Developed Spark code using PySpark and Spark-SQL/Streaming for faster testing and processing of data.
  • Worked as a part of AWS build team.
  • Create, configure and managing S3 bucket (storage).
  • Experience on AWS EC2, EMR, LAMBDA and Cloud Watch.
  • Applied Jenkins AWS Code Deploy plugin to deploy to AWS and migrate applications to the AWS cloud.
  • Import the data from different sources like HDFS/Hbase into Spark RDD.
  • Experienced with batch processing of data sources using Apache Spark and Elastic search.
  • Experienced in implementing Spark RDD transformations, actions to implement business analysis
  • Migrated Hive QL queries on structured into Spark QL to improve performance
  • Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
  • Worked on partitioning HIVE tables and running the scripts in parallel to reduce run-time of the scripts.
  • Worked on Data Serialization formats for converting Complex objects into sequence bits by using AVRO, PARQUET, JSON, CSV formats.
  • Responsible for analyzing and cleansing raw data by performing Hive/Impala queries and running Pig scripts on data.
  • Administration, installing, upgrading, and managing distributions of Hadoop, Hive, Hbase.
  • Involved in performance of troubleshooting and tuning Hadoop clusters.
  • Created Hive tables, loaded data and wrote Hive queries that run within the map.
  • Implemented business logic by writing Hive UDFs in Java.
  • Developed Shell scripts and some of Perl scripts based on the user requirement.
  • Wrote XML scripts to build OOZIE functionality.
  • Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
  • Extensively worked on creating End-End data pipeline orchestration using Oozie.
  • Built Data set, Lens, and visualization charts/graphs in the PLATFORA environment.
  • Evaluated suitability of Hadoop and its ecosystem to the above project and implementing / validating with various proof of concept (POC) applications to eventually adopt them to benefit from the Big Data Hadoop initiative.

Environment: Map Reduce, HDFS, Spring Boot, Microservices, AWS, PySpark, Hive, Pig, SQL, Sqoop, Oozie, Shell scripting, Cron Jobs, Apache Kafka, J2EE.

Big Data Engineer/Hadoop Developer

Confidential, Fort Worth, TX

Responsibilities:

  • Configured Spark Streaming to receive real time data from the Apache Kafka and store the stream data to HDFS using Scala.
  • Processed data into HDFS by developing solutions and analyzed the data using Map Reduce, PIG, and Hive to produce summary results from Hadoop to downstream systems.
  • Written Map Reduce code to process and parsing the data from various sources and storing parsed data into HBase and Hive using HBase-Hive Integration.
  • Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries.
  • Created Managed tables and External tables in Hive and loaded data from HDFS.
  • Developed Spark code by using Scala and Spark-SQL for faster processing and testing and performed complex HiveQL queries on Hive tables.
  • Developed Scala scripts, UDAFs using both Data frames/SQL/Data sets and RDD/MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
  • Maintained existing data migration program with occasional upgrades and enhancements.
  • Scheduled several times based Oozie workflow by developing Python scripts.
  • Implemented Sqoop jobs for large sets of structured and semi-structured data migration between HDFS and/or other data storage like Hive.
  • Designing ETL Data Pipeline flow to ingest the data from RDBMS source to Hadoop using shell script, Sqoop, package and MySQL.
  • Worked on Context Filters and Cascading Filters page and implemented various parameters, filters, calculated table calculations, aggregations, specific conditions, & created analytical reports
  • Worked on data migration/ETL from Teradata to Hadoop.
  • Developed Python scripts to automate data sampling process. Ensured the data integrity by checking for completeness, duplication, accuracy, and consistency
  • Optimized the Hive tables using optimization techniques like partitions and bucketing to provide better.
  • Developed Spark code using PySpark and Spark-SQL/Streaming for faster testing and processing of data.
  • Worked in a data centric role involving data migration in defining data framework for reporting.
  • Involved in Migration of the Hive queries to Impala
  • Implemented POC to migrate Map Reduce jobs into Spark RDD transformations using SCALA
  • Scheduled map reduces jobs in production environment using Oozie scheduler.
  • Involved in Cluster maintenance, Cluster Monitoring and Troubleshooting, Manage and review data backups and log files.
  • Experience in building Intermediate database creation scripts, data validation scripts and testing the extracted data.
  • Worked as a part of AWS build team.
  • Create, configure and managing S3 bucket (storage).
  • Experience on AWS EC2, EMR, LAMBDA and Cloud Watch.
  • Applied Jenkins AWS Code Deploy plugin to deploy to AWS and migrate applications to the AWS cloud.
  • Worked on AWS for fetching the picture files from AWS to UI.
  • Analyzing Hadoop cluster and different Big Data analytic tools including Hive, HBase and Sqoop.
  • Import the data from different sources like HDFS/Hbase into Spark RDD.
  • Experienced with batch processing of data sources using Apache Spark and Elastic search.
  • Experienced in implementing Spark RDD transformations, actions to implement business analysis.

Environment: HDFS, Map Reduce, Hive, Apache, Sqoop, AWS, Oozie Scheduler, Shell Scripts, Cloudera, Kafka, Spark, Spring boot, Microservices.

Big data/Hadoop Developer

Confidential, Greenwood Village, CO

Responsibilities:

  • Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
  • Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
  • Developed Scripts and Batch Job to schedule various Hadoop Program.
  • Wrote Hive queries for data analysis to meet the business requirements.
  • Exploring with Spark improving performance and optimization of the existing algorithms in Hadoop MapReduce using Spark Context, Spark-SQL, Data Frames, Pair RDD's and Spark YARN.
  • Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers.
  • Developed various automated scripts for DI (Data Ingestion) and DL (Data Loading) using python & java map reduce.
  • Used Amazon CloudWatch to monitor and track resources on AWS.
  • Knowledge of designing and deployment of Hadoop cluster and different Big Data analytic tools including Hive, HBase, Oozie, Sqoop, Spark.
  • Real time streaming of data using Spark with Kafka.
  • Responsible for developing data pipeline using Flume, Sqoop and Pig to extract the data from weblogs and store in HDFS.
  • Hands on coding - Write and test the code for the Ingest automation process - Full and Incremental Loads. Design the solution and develop the program for data ingestion using - Sqoop, map reduce, Shell script & python.
  • Created Data Quality Scripts using SQL and Hive to validate successful das ta load and quality of the data. Created various types of data visualizations using Python and Tableau.
  • Experienced in implementing Spark RDD transformations, actions to implement business analysis.
  • Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
  • Loaded data into the cluster from dynamically generated files using Flume and from relational database management systems using Sqoop.
  • Worked on Cluster of size 400 nodes.
  • Utilized Apache Spark with Python to develop and execute Big Data Analytics and Machine learning applications, executed machine learning use cases under Spark ML and Mllib.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala.
  • Involved in importing the real-time data to Hadoop using Kafka and implemented Oozie jobs for daily imports.
  • Built various graphs for business decision making using Python matplotlib library.
  • Used Python library Beautiful Soup for web scrapping python to extract data for building graphs.
  • Created Hive tables and involved in data loading and writing Hive UDFs.

Environment: Scala, Spark, Spark-Core, Spark SQL, HDFS, AWS, Apache, Hive, Linux, Oozie, MapReduce, Cloudera, Apache Kafka, Sqoop, AWS, S3.

Hadoop Developer

Confidential, Boston, MA

Responsibilities:

  • Gathered project requirements and liaised with Business stakeholders to gain the application knowledge and architect solutions for existing problems.
  • Played an active role in end-to-end implementation of the Hadoop infrastructure using Pig, Sqoop, Hive, and Spark and migrated the data from legacy systems (like Teradata, MySQL) into HDFS.
  • Worked on loading CSV/TXT/AVRO/PARQUET files using Python/Java language in Spark Framework and process the data by creating Spark Data frame and RDD and save the file in parquet format in HDFS to load into fact table using ORC Reader.
  • Ingested data from Teradata using Sqoop into HDFS and worked with highly unstructured and semi-structured data.
  • Developed Oozie workflows to perform daily, weekly, and monthly incremental loads into hive tables.
  • Migrated complex Map Reduce programs into Spark RDD transformations using PySpark.
  • Loaded the data into Spark RDD and do in memory data Computation to generate the Output response.
  • Ingested data in mini-batches and performed RDD transformations on those mini-batches of data.
  • Good knowledge in setting up batch intervals, split intervals, and window intervals in Spark Streaming.
  • Used Oozie workflow engine to run multiple jobs which run independently.
  • Worked on Kafka while dealing with raw data, by transforming into new Kafka topics for further consumption.
  • Involved in creating Hive Tables, loading with data, and writing Hive queries which will invoke and run MapReduce jobs in the backend.
  • Writing MapReduce (Hadoop) programs to convert text files into AVRO and loading into Hive (Hadoop) tables.
  • Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi structured data coming from various sources.
  • Developing design documents considering all possible approaches and identifying best of them.
  • Experienced with Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Import the data from different sources like HDFS/HBase into Spark RDD.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, and Python.
  • Created a POC to orchestrate the migration from existing Teradata platform to Hadoop infrastructure to increase the efficiency of the data used for analytics and decision making.
  • Utilized Informatica Power Center ETL tool to extract the data from heterogeneous sources and load them into the target systems.
  • Implemented the Slowly Changing Dimensions to capture the updated master data and load into the target Teradata system according to the business logic.
  • Created mapping variables and data flow logic from source to target systems.
  • Designed Tasks, Workflows, Worklets and sessions using Workflow Manager Tool and Monitored workflow using Informatica Workflow manager.
  • Participated in daily status calls with internal team and weekly calls with client and updated the status report.
  • Ability to work with onsite and offshore team members.

Environment: Hadoop, MapReduce, HDFS, Hive QL, Pig, Java, Spark, Kafka, AWS, SBT, Maven, Sqoop, Zookeeper, Python, Informatica Power Center, Teradata.

Java Developer

Confidential

Responsibilities:

  • Involved in gathering system requirements for the application and worked with the business team to review the requirements and went through the Software Requirement Specification document and Architecture document.
  • Involved in intense User Interface (UI) operations and client-side validations using AJAX toolkit.
  • Used SOAP to expose company applications as a Web Service to outside clients.
  • Log package is used for the debugging.
  • Used Web Services for creating rate summary and used WSDL and SOAP messages for getting insurance plans from the different module and used XML parsers for data retrieval.
  • Developed business components and integrated those using Spring features such as Dependency Injection, Auto wiring components such as DAO layers and service proxy layers.
  • Used Spring AOP to implement Distributed declarative transaction throughout the application.
  • Wrote Hibernate configuration XML files to manage data persistence.
  • Worked on Delete printer module using python.
  • Extensively worked on Python & Rest API
  • Used TOAD to generate SQL queries for the applications, and to see the reports from log tables.
  • Involved in the migration of Data from Excel, Flat file, Oracle, XML files to SQL Server by using BCP and DTS utility.

Environment: Java/J2EE, HTML, Axis, Servlets, Web services, Apache, Restful Web Services, Spring, DB2, RAD, Rational Clear case, AWS, WCF, AJAX.

TECHNICAL SKILLS:

Hadoop/Bigdata: HDFS, MapReduce, Sqoop, Hive, PIG, HBASE, Zookeeper, Cluster configuration, FLUME, AWS

Distributions: Cloudera

Java Technologies: Core Java, JDBC, HTML, JSP, Servlets, Tomcat, JavaScript

Databases: SQL, NOSQL HBase, MYSQL, Oracle, PL/SQL.

Programming Languages: C, C++, Java, SQL, Shell, Python

IDE's Utilities: Eclipse

Web Technologies: J2EE, JMS, Web Service

Protocols: TCP/IP, SSH, HTTP and HTTPS

Scripting: HTML, JavaScript, CSS, XML and Ajax

Operating System: Windows, Mac, Linux and UNIX

IDE: Eclipse, Microsoft Visual Studio 2008, 2012, Flex Builder

Version control: Git, SVN, CVS

Tools: FileZilla, Putty, PL/SQL Developer, JUnit

We'd love your feedback!