Hadoop Developer Resume Salt Lake City, Utah - Hire IT People

PROFESSIONAL SUMMARY:

Have 6+ years of experience in Analysis, Design, Development, Testing, Implementation, Maintenance and Enhancements on various IT Projects and experience in Big Data in implementing end - to-end Hadoop solutions.
Experience in working in environments using Agile (SCRUM) and Test-Driven development methodologies.
Extensive experience in installing, configuring and using ecosystem components like Hadoop MapReduce, HDFS, Sqoop, Pig, Hive & Spark.
Experience in different Hadoop distributions like Cloudera (CDH3 & CDH4) and Horton Works Distributions (HDP).
Experience in analyzing data using HIVEQL, PIG Latin and custom Map Reduce programs in JAVA.
Good understanding in integration of various data sources like RDBMS, Spreadsheets, Text files, JSON and XML files.
Good working experience on using Sqoop to import data into HDFS from RDBMS and vice-versa.
Extensive experience in loading and analyzing large datasets with Hadoop framework (Map Reduce, HDFS, PIG, HIVE, Apache NIFI, Sqoop, SPARK), NoSQL databases like HBase.
Solid understanding of Hadoop MRV1 and Hadoop MRV2 (or) YARN Architecture.
Worked with Hadoop ecosystem mostly concentrating on on MapReduce, Apache Spark and Pl/SQL
Hands on experience in configuring and administering the Hadoop Cluster using major Hadoop Distributions like Apache Hadoop and Cloudera.
Good knowledge in SQL and PL/SQL to write Stored Procedures and Functions and writing unit test cases.
Used Amazon CloudWatch to monitor and track resources on AWS.
Experience working in Hive or related tools on Hadoop, Performance tuning, File Format, executing designing complex hive HQL’s, data migration conversion
Serving as a hands-on subject matter expert for automation in an AWS infrastructure environment
Extensive experience in Extraction, Transformation and Loading (ETL) of data from multiple sources into Data Warehouse.
Experience in migrating the data using Sqoop from HDFS to Relational Database System and vice-versa.
Hands on experience working with databases like SQL Server 2010 and MySQL.
Strong knowledge on Hadoop eco systems including HDFS, Hive, Oozie, HBase, Pig, Sqoop, Zookeeper etc.
Applied various machine learning algorithms and statistical modeling like decision tree, logistic regression, Gradient Boosting Machine to build predictive model using scikit-learn package in Python
Experience with advanced J2EE Frameworks such as spring, JSF and Hibernate.
Expertise in using XML related technologies such as XML, XSD, XSLT, JSON.
Experience in managing and reviewing Hadoop log files.
Experience in understanding the security requirements for Hadoop and integrate with Kerberos authentication and authorization infrastructure.
Experienced in Strong scripting skills in Python and Unix shell.

TECHNICAL SKILLS:

Big Data Technologies: Apache Hadoop, Apache NIFI, Map Reduce, Hive, Pig, HBase, Sqoop, Spark, HDFS, Apache Kafka, Apache oozie, Apache Zookeeper, Autosys.

Hadoop Distributions: Cloudera, Horton Works.

Development Tools: IntelliJ IDEA, Eclipse, Putty

Programming Languages: Scala, Pyspark, Java, HQL

Build Tools: Maven, SBT

Query Tools: AQT (Advance Query Tool)

NOSQL Databases: HBase

Version Control Tools: Git Hub, SVN

Methodologies: Agile(scrum), Waterfall

Databases: MySQL, Oracle.

Operating Systems: Windows 7/10, Linux (Cent OS, Ubuntu), Mac OS

PROFESSIONAL EXPERIENCE:

Confidential, Salt Lake City, Utah

Hadoop Developer

Responsibilities:

Working in agile, successfully completed stories related to ingestion, transformation and publication of data on time.
Processed data into HDFS by developing solutions and analyzed the data using Map Reduce and Hive to produce summary results from Hadoop to downstream systems.
Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries.
Used shell scripting for the code development.
Created Managed tables and External tables in Hive and loaded data from HDFS.
Developed test cases documentation for each task after the development and unit testing is done.
Developed Spark code by using Scala and Spark-SQL for faster processing and testing and performed complex HiveQL queries on Hive tables.
Worked on Apache-NiFi data Pipeline to process large set of data and configured Lookup’s for Data Validation and Integrity.
Designing ETL Data Pipeline flow to ingest the data from RDBMS source to Hadoop using shell script, Sqoop, package and MySQL.
Optimized the Hive tables using optimization techniques like partitions and bucketing to provide better.
Strong data modelling and data mapping experience.
Implemented POC to migrate Map Reduce jobs into Spark RDD transformations using SCALA
Working with cloud platform to publish the code and deploy changes to AWS dev/uat servers.
Scheduled jobs in production environment using AUTOSYS Job scheduler.
Involved in Cluster maintenance, Cluster Monitoring and Troubleshooting, Manage and review data backups and log files.
Helped in creating the End to End process documentation for couple of projects and make it available for the business.
Analyzing Hadoop cluster and different Big Data analytic tools including Hive and Sqoop.
Helped the Team lead and the team in resolving production support issues and making sure that all the jobs ran fine.
Good working experience on model-based testing tools that generates test inputs or test cases from stored information (AWS Cloud).
Worked with visual modelling tools like flowcharts, pictures and diagrams.
Have worked on AWS to integrate the server side and client-side code

Environment: HDFS, Hive, Sqoop, Autosys job Scheduler, AWS, Shell Scripts, HBase, AQT (advance query tool), Git Hub, Apache, Putty, ServiceNow, Cloudera, Spark, Pyspark, Cloud Era, Horton Works.

Confidential, Minneapolis, MN

Hadoop Developer

Responsibilities:

Configured Spark Streaming to receive real time data from the Apache Kafka and store the stream data to HDFS using Scala.
Processed data into HDFS by developing solutions and analyzed the data using Map Reduce, PIG, and Hive to produce summary results from Hadoop to downstream systems.
Written Map Reduce code to process and parsing the data from various sources and storing parsed data into HBase and Hive using HBase-Hive Integration.
Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries.
Created Managed tables and External tables in Hive and loaded data from HDFS.
Developed Spark code by using Scala and Spark-SQL for faster processing and testing and performed complex HiveQL queries on Hive tables.
Maintained existing data migration program with occasional upgrades and enhancements.
Scheduled several times based Oozie workflow by developing Python scripts.
Implemented Sqoop jobs for large sets of structured and semi-structured data migration between HDFS and/or other data storage like Hive.
Designing ETL Data Pipeline flow to ingest the data from RDBMS source to Hadoop using shell script, Sqoop, package and MySQL.
Worked on data migration/ETL from Teradata to Hadoop.
Developed Python scripts to automate data sampling process. Ensured the data integrity by checking for completeness, duplication, accuracy, and consistency
Optimized the Hive tables using optimization techniques like partitions and bucketing to provide better.
Worked in a data centric role involving data migration in defining data framework for reporting.
Involved in Migration of the Hive queries to Impala
Implemented POC to migrate Map Reduce jobs into Spark RDD transformations using SCALA
Scheduled map reduces jobs in production environment using Oozie scheduler.
Involved in Cluster maintenance, Cluster Monitoring and Troubleshooting, Manage and review data backups and log files.
Experience in building Intermediate database creation scripts, data validation scripts and testing the extracted data
Worked on AWS for fetching the picture files from AWS to UI.
Analyzing Hadoop cluster and different Big Data analytic tools including Hive, HBase and Sqoop.
Created a Python/Django based web application using Python scripting for data processing, MySQL for the database, and HTML/CSS/JQuery and HighCharts for data visualization of the served pages.

Environment: HDFS, Map Reduce, Hive, Apache, Sqoop, AWS, Oozie Scheduler, Shell Scripts, HBase, Cloudera, Kafka, Spark, Scala, Cloud Era, Horton Works.

Confidential, Columbus, OH

Big data/Hadoop Developer

Responsibilities:

Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
Developed Scripts and Batch Job to schedule various Hadoop Program.
Wrote Hive queries for data analysis to meet the business requirements.
Exploring with Spark improving performance and optimization of the existing algorithms in Hadoop MapReduce using Spark Context, Spark-SQL, Data Frames, Pair RDD's and Spark YARN.
Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers.
Developed various automated scripts for DI (Data Ingestion) and DL (Data Loading) using python & java map reduce.
Used Amazon CloudWatch to monitor and track resources on AWS.
Knowledge of designing and deployment of Hadoop cluster and different Big Data analytic tools including Hive, HBase, Oozie, Sqoop, Spark.
Real time streaming of data using Spark with Kafka.
Responsible for developing data pipeline using Flume, Sqoop and Pig to extract the data from weblogs and store in HDFS.
Hands on coding - Write and test the code for the Ingest automation process - Full and Incremental Loads. Design the solution and develop the program for data ingestion using - Sqoop, map reduce, Shell script & python
Created Data Quality Scripts using SQL and Hive to validate successful das ta load and quality of the data. Created various types of data visualizations using Python and Tableau.
Experienced in implementing Spark RDD transformations, actions to implement business analysis.
Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
Loaded data into the cluster from dynamically generated files using Flume and from relational database management systems using Sqoop.
Utilized Apache Spark with Python to develop and execute Big Data Analytics and Machine learning applications, executed machine Learning use cases under Spark ML and Mllib.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala.
Involved in importing the real-time data to Hadoop using Kafka and implemented Oozie jobs for daily imports.
Built various graphs for business decision making using Pythonmatplotlib library.
Used Python library BeautifulSoup for web scrappingpython to extract data for building graphs.
Created Hive tables and involved in data loading and writing Hive UDFs.

Environment: Scala, Spark, Spark Streaming, Spark SQL, HDFS, AWS, Apache, Hive, Pig, Linux, Oozie, MapReduce, Apache Kafka, Sqoop, AWS, S3.

Confidential

Hadoop Developer

Responsibilities:

Provided application demo to the client by designing and developing a search engine, report analysis trends.
Assisted with the architect to analyze the existing system and future system Prepared design blue pints and application flow documentation.
Experienced in managing and reviewing Hadoop log files Load and transform large sets of structured, semi-structured and unstructured data.
Responsible to manage data coming from different sources and application Supported Map Reduce Programs those are running on the cluster.
Responsible for working with Message broker system such as Kafka Extracted data from mainframes and feed to KAFKA and ingested to HBase to perform Analytics.
Created MapReduce jobs to extracts the contents from HBase and configured in OOZIE workflow to generate analytical reports.
Written MapReduce programs to organize the data and ingest the data to suitable for analytics in client specified format.
Hands on experience in writing python scripts to optimize the performance.
Extracted files from Cassandra through Sqoop and placed in HDFS and processed.
Involved in writing spark applications using Scala. Hands on experience in creating RDDs, transformations, and Actions while implementing spark applications.
Stored the derived the results in HBase from analysis and make it available to data ingestion for SOLR for indexing data.
Documented all the challenges, issues involved to deal with the security system and Implemented best practices.
Created Project structures and configurations according to the project architecture and made it available to the junior developer to continue their work.

Environment: PIG, Scala, Kafka HIVE, Map Reduce, Apache, Sqoop Zookeeper, AWS, SVN, windows AD, windows KDC, Hortonworks distribution of Hadoop 2.3.

Confidential

Java Developer

Responsibilities:

Involved in gathering system requirements for the application and worked with the business team to review the requirements and went through the Software Requirement Specification document and Architecture document.
Involved in intense User Interface (UI) operations and client-side validations using AJAX toolkit.
Used SOAP to expose company applications as a Web Service to outside clients.
Log package is used for the debugging.
Used Web Services for creating rate summary and used WSDL and SOAP messages for getting insurance plans from the different module and used XML parsers for data retrieval.
Developed business components and integrated those using Spring features such as Dependency Injection, Auto wiring components such as DAO layers and service proxy layers.
Used Spring AOP to implement Distributed declarative transaction throughout the application.
Wrote Hibernate configuration XML files to manage data persistence.
Worked on Delete printer module using python.
Extensively worked on Python & Rest API
Used TOAD to generate SQL queries for the applications, and to see the reports from log tables.
Involved in the migration of Data from Excel, Flat file, Oracle, XML files to SQL Server by using BCP and DTS utility.

Environment: Java/J2EE, HTML, Axis, Servlets, Web services, Apache, Restful Web Services, Spring, DB2, RAD, Rational Clear case, AWS, WCF, AJAX.

We provide IT Staff Augmentation Services!

Hadoop Developer Resume

Salt Lake City, UtaH

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship