We provide IT Staff Augmentation Services!

Hadoop Developer Resume

5.00/5 (Submit Your Rating)

Salt Lake City, UtaH

PROFESSIONAL SUMMARY:

  • Have 6+ years of experience in Analysis, Design, Development, Testing, Implementation, Maintenance and Enhancements on various IT Projects and experience in Big Data in implementing end - to-end Hadoop solutions.
  • Experience in working in environments using Agile (SCRUM) and Test-Driven development methodologies.
  • Extensive experience in installing, configuring and using ecosystem components like Hadoop MapReduce, HDFS, Sqoop, Pig, Hive & Spark.
  • Experience in different Hadoop distributions like Cloudera (CDH3 & CDH4) and Horton Works Distributions (HDP).
  • Experience in analyzing data using HIVEQL, PIG Latin and custom Map Reduce programs in JAVA.
  • Good understanding in integration of various data sources like RDBMS, Spreadsheets, Text files, JSON and XML files.
  • Good working experience on using Sqoop to import data into HDFS from RDBMS and vice-versa.
  • Extensive experience in loading and analyzing large datasets with Hadoop framework (Map Reduce, HDFS, PIG, HIVE, Apache NIFI, Sqoop, SPARK), NoSQL databases like HBase.
  • Solid understanding of Hadoop MRV1 and Hadoop MRV2 (or) YARN Architecture.
  • Worked with Hadoop ecosystem mostly concentrating on on MapReduce, Apache Spark and Pl/SQL
  • Hands on experience in configuring and administering the Hadoop Cluster using major Hadoop Distributions like Apache Hadoop and Cloudera.
  • Good knowledge in SQL and PL/SQL to write Stored Procedures and Functions and writing unit test cases.
  • Used Amazon CloudWatch to monitor and track resources on AWS.
  • Experience working in Hive or related tools on Hadoop, Performance tuning, File Format, executing designing complex hive HQL’s, data migration conversion
  • Serving as a hands-on subject matter expert for automation in an AWS infrastructure environment
  • Extensive experience in Extraction, Transformation and Loading (ETL) of data from multiple sources into Data Warehouse.
  • Experience in migrating the data using Sqoop from HDFS to Relational Database System and vice-versa.
  • Hands on experience working with databases like SQL Server 2010 and MySQL.
  • Strong knowledge on Hadoop eco systems including HDFS, Hive, Oozie, HBase, Pig, Sqoop, Zookeeper etc.
  • Applied various machine learning algorithms and statistical modeling like decision tree, logistic regression, Gradient Boosting Machine to build predictive model using scikit-learn package in Python
  • Experience with advanced J2EE Frameworks such as spring, JSF and Hibernate.
  • Expertise in using XML related technologies such as XML, XSD, XSLT, JSON.
  • Experience in managing and reviewing Hadoop log files.
  • Experience in understanding the security requirements for Hadoop and integrate with Kerberos authentication and authorization infrastructure.
  • Experienced in Strong scripting skills in Python and Unix shell.

TECHNICAL SKILLS:

Big Data Technologies: Apache Hadoop, Apache NIFI, Map Reduce, Hive, Pig, HBase, Sqoop, Spark, HDFS, Apache Kafka, Apache oozie, Apache Zookeeper, Autosys.

Hadoop Distributions: Cloudera, Horton Works.

Development Tools: IntelliJ IDEA, Eclipse, Putty

Programming Languages: Scala, Pyspark, Java, HQL

Build Tools: Maven, SBT

Query Tools: AQT (Advance Query Tool)

NOSQL Databases: HBase

Version Control Tools: Git Hub, SVN

Methodologies: Agile(scrum), Waterfall

Databases: MySQL, Oracle.

Operating Systems: Windows 7/10, Linux (Cent OS, Ubuntu), Mac OS

PROFESSIONAL EXPERIENCE:

Confidential, Salt Lake City, Utah

Hadoop Developer

Responsibilities:

  • Working in agile, successfully completed stories related to ingestion, transformation and publication of data on time.
  • Processed data into HDFS by developing solutions and analyzed the data using Map Reduce and Hive to produce summary results from Hadoop to downstream systems.
  • Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries.
  • Used shell scripting for the code development.
  • Created Managed tables and External tables in Hive and loaded data from HDFS.
  • Developed test cases documentation for each task after the development and unit testing is done.
  • Developed Spark code by using Scala and Spark-SQL for faster processing and testing and performed complex HiveQL queries on Hive tables.
  • Worked on Apache-NiFi data Pipeline to process large set of data and configured Lookup’s for Data Validation and Integrity.
  • Designing ETL Data Pipeline flow to ingest the data from RDBMS source to Hadoop using shell script, Sqoop, package and MySQL.
  • Optimized the Hive tables using optimization techniques like partitions and bucketing to provide better.
  • Strong data modelling and data mapping experience.
  • Implemented POC to migrate Map Reduce jobs into Spark RDD transformations using SCALA
  • Working with cloud platform to publish the code and deploy changes to AWS dev/uat servers.
  • Scheduled jobs in production environment using AUTOSYS Job scheduler.
  • Involved in Cluster maintenance, Cluster Monitoring and Troubleshooting, Manage and review data backups and log files.
  • Helped in creating the End to End process documentation for couple of projects and make it available for the business.
  • Analyzing Hadoop cluster and different Big Data analytic tools including Hive and Sqoop.
  • Helped the Team lead and the team in resolving production support issues and making sure that all the jobs ran fine.
  • Good working experience on model-based testing tools that generates test inputs or test cases from stored information (AWS Cloud).
  • Worked with visual modelling tools like flowcharts, pictures and diagrams.
  • Have worked on AWS to integrate the server side and client-side code

Environment: HDFS, Hive, Sqoop, Autosys job Scheduler, AWS, Shell Scripts, HBase, AQT (advance query tool), Git Hub, Apache, Putty, ServiceNow, Cloudera, Spark, Pyspark, Cloud Era, Horton Works.

Confidential, Minneapolis, MN

Hadoop Developer

Responsibilities:

  • Configured Spark Streaming to receive real time data from the Apache Kafka and store the stream data to HDFS using Scala.
  • Processed data into HDFS by developing solutions and analyzed the data using Map Reduce, PIG, and Hive to produce summary results from Hadoop to downstream systems.
  • Written Map Reduce code to process and parsing the data from various sources and storing parsed data into HBase and Hive using HBase-Hive Integration.
  • Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries.
  • Created Managed tables and External tables in Hive and loaded data from HDFS.
  • Developed Spark code by using Scala and Spark-SQL for faster processing and testing and performed complex HiveQL queries on Hive tables.
  • Maintained existing data migration program with occasional upgrades and enhancements.
  • Scheduled several times based Oozie workflow by developing Python scripts.
  • Implemented Sqoop jobs for large sets of structured and semi-structured data migration between HDFS and/or other data storage like Hive.
  • Designing ETL Data Pipeline flow to ingest the data from RDBMS source to Hadoop using shell script, Sqoop, package and MySQL.
  • Worked on data migration/ETL from Teradata to Hadoop.
  • Developed Python scripts to automate data sampling process. Ensured the data integrity by checking for completeness, duplication, accuracy, and consistency
  • Optimized the Hive tables using optimization techniques like partitions and bucketing to provide better.
  • Worked in a data centric role involving data migration in defining data framework for reporting.
  • Involved in Migration of the Hive queries to Impala
  • Implemented POC to migrate Map Reduce jobs into Spark RDD transformations using SCALA
  • Scheduled map reduces jobs in production environment using Oozie scheduler.
  • Involved in Cluster maintenance, Cluster Monitoring and Troubleshooting, Manage and review data backups and log files.
  • Experience in building Intermediate database creation scripts, data validation scripts and testing the extracted data
  • Worked on AWS for fetching the picture files from AWS to UI.
  • Analyzing Hadoop cluster and different Big Data analytic tools including Hive, HBase and Sqoop.
  • Created a Python/Django based web application using Python scripting for data processing, MySQL for the database, and HTML/CSS/JQuery and HighCharts for data visualization of the served pages.

Environment: HDFS, Map Reduce, Hive, Apache, Sqoop, AWS, Oozie Scheduler, Shell Scripts, HBase, Cloudera, Kafka, Spark, Scala, Cloud Era, Horton Works.

Confidential, Columbus, OH

Big data/Hadoop Developer

Responsibilities:

  • Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
  • Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
  • Developed Scripts and Batch Job to schedule various Hadoop Program.
  • Wrote Hive queries for data analysis to meet the business requirements.
  • Exploring with Spark improving performance and optimization of the existing algorithms in Hadoop MapReduce using Spark Context, Spark-SQL, Data Frames, Pair RDD's and Spark YARN.
  • Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers.
  • Developed various automated scripts for DI (Data Ingestion) and DL (Data Loading) using python & java map reduce.
  • Used Amazon CloudWatch to monitor and track resources on AWS.
  • Knowledge of designing and deployment of Hadoop cluster and different Big Data analytic tools including Hive, HBase, Oozie, Sqoop, Spark.
  • Real time streaming of data using Spark with Kafka.
  • Responsible for developing data pipeline using Flume, Sqoop and Pig to extract the data from weblogs and store in HDFS.
  • Hands on coding - Write and test the code for the Ingest automation process - Full and Incremental Loads. Design the solution and develop the program for data ingestion using - Sqoop, map reduce, Shell script & python
  • Created Data Quality Scripts using SQL and Hive to validate successful das ta load and quality of the data. Created various types of data visualizations using Python and Tableau.
  • Experienced in implementing Spark RDD transformations, actions to implement business analysis.
  • Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
  • Loaded data into the cluster from dynamically generated files using Flume and from relational database management systems using Sqoop.
  • Utilized Apache Spark with Python to develop and execute Big Data Analytics and Machine learning applications, executed machine Learning use cases under Spark ML and Mllib.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala.
  • Involved in importing the real-time data to Hadoop using Kafka and implemented Oozie jobs for daily imports.
  • Built various graphs for business decision making using Pythonmatplotlib library.
  • Used Python library BeautifulSoup for web scrappingpython to extract data for building graphs.
  • Created Hive tables and involved in data loading and writing Hive UDFs.

Environment: Scala, Spark, Spark Streaming, Spark SQL, HDFS, AWS, Apache, Hive, Pig, Linux, Oozie, MapReduce, Apache Kafka, Sqoop, AWS, S3.

Confidential

Hadoop Developer

Responsibilities:

  • Provided application demo to the client by designing and developing a search engine, report analysis trends.
  • Assisted with the architect to analyze the existing system and future system Prepared design blue pints and application flow documentation.
  • Experienced in managing and reviewing Hadoop log files Load and transform large sets of structured, semi-structured and unstructured data.
  • Responsible to manage data coming from different sources and application Supported Map Reduce Programs those are running on the cluster.
  • Responsible for working with Message broker system such as Kafka Extracted data from mainframes and feed to KAFKA and ingested to HBase to perform Analytics.
  • Created MapReduce jobs to extracts the contents from HBase and configured in OOZIE workflow to generate analytical reports.
  • Written MapReduce programs to organize the data and ingest the data to suitable for analytics in client specified format.
  • Hands on experience in writing python scripts to optimize the performance.
  • Extracted files from Cassandra through Sqoop and placed in HDFS and processed.
  • Involved in writing spark applications using Scala. Hands on experience in creating RDDs, transformations, and Actions while implementing spark applications.
  • Stored the derived the results in HBase from analysis and make it available to data ingestion for SOLR for indexing data.
  • Documented all the challenges, issues involved to deal with the security system and Implemented best practices.
  • Created Project structures and configurations according to the project architecture and made it available to the junior developer to continue their work.

Environment: PIG, Scala, Kafka HIVE, Map Reduce, Apache, Sqoop Zookeeper, AWS, SVN, windows AD, windows KDC, Hortonworks distribution of Hadoop 2.3.

Confidential

Java Developer

Responsibilities:

  • Involved in gathering system requirements for the application and worked with the business team to review the requirements and went through the Software Requirement Specification document and Architecture document.
  • Involved in intense User Interface (UI) operations and client-side validations using AJAX toolkit.
  • Used SOAP to expose company applications as a Web Service to outside clients.
  • Log package is used for the debugging.
  • Used Web Services for creating rate summary and used WSDL and SOAP messages for getting insurance plans from the different module and used XML parsers for data retrieval.
  • Developed business components and integrated those using Spring features such as Dependency Injection, Auto wiring components such as DAO layers and service proxy layers.
  • Used Spring AOP to implement Distributed declarative transaction throughout the application.
  • Wrote Hibernate configuration XML files to manage data persistence.
  • Worked on Delete printer module using python.
  • Extensively worked on Python & Rest API
  • Used TOAD to generate SQL queries for the applications, and to see the reports from log tables.
  • Involved in the migration of Data from Excel, Flat file, Oracle, XML files to SQL Server by using BCP and DTS utility.

Environment: Java/J2EE, HTML, Axis, Servlets, Web services, Apache, Restful Web Services, Spring, DB2, RAD, Rational Clear case, AWS, WCF, AJAX.

We'd love your feedback!