Hadoop Developer Resume
Salt Lake City, UtaH
PROFESSIONAL SUMMARY:
- Have 6+ years of experience in Analysis, Design, Development, Testing, Implementation, Maintenance and Enhancements on various IT Projects and experience in Big Data in implementing end - to-end Hadoop solutions.
- Experience in working in environments using Agile (SCRUM) and Test-Driven development methodologies.
- Extensive experience in installing, configuring and using ecosystem components like Hadoop MapReduce, HDFS, Sqoop, Pig, Hive & Spark.
- Experience in different Hadoop distributions like Cloudera (CDH3 & CDH4) and Horton Works Distributions (HDP).
- Experience in analyzing data using HIVEQL, PIG Latin and custom Map Reduce programs in JAVA.
- Good understanding in integration of various data sources like RDBMS, Spreadsheets, Text files, JSON and XML files.
- Good working experience on using Sqoop to import data into HDFS from RDBMS and vice-versa.
- Extensive experience in loading and analyzing large datasets with Hadoop framework (Map Reduce, HDFS, PIG, HIVE, Apache NIFI, Sqoop, SPARK), NoSQL databases like HBase.
- Solid understanding of Hadoop MRV1 and Hadoop MRV2 (or) YARN Architecture.
- Worked with Hadoop ecosystem mostly concentrating on on MapReduce, Apache Spark and Pl/SQL
- Hands on experience in configuring and administering the Hadoop Cluster using major Hadoop Distributions like Apache Hadoop and Cloudera.
- Good knowledge in SQL and PL/SQL to write Stored Procedures and Functions and writing unit test cases.
- Used Amazon CloudWatch to monitor and track resources on AWS.
- Experience working in Hive or related tools on Hadoop, Performance tuning, File Format, executing designing complex hive HQL’s, data migration conversion
- Serving as a hands-on subject matter expert for automation in an AWS infrastructure environment
- Extensive experience in Extraction, Transformation and Loading (ETL) of data from multiple sources into Data Warehouse.
- Experience in migrating the data using Sqoop from HDFS to Relational Database System and vice-versa.
- Hands on experience working with databases like SQL Server 2010 and MySQL.
- Strong knowledge on Hadoop eco systems including HDFS, Hive, Oozie, HBase, Pig, Sqoop, Zookeeper etc.
- Applied various machine learning algorithms and statistical modeling like decision tree, logistic regression, Gradient Boosting Machine to build predictive model using scikit-learn package in Python
- Experience with advanced J2EE Frameworks such as spring, JSF and Hibernate.
- Expertise in using XML related technologies such as XML, XSD, XSLT, JSON.
- Experience in managing and reviewing Hadoop log files.
- Experience in understanding the security requirements for Hadoop and integrate with Kerberos authentication and authorization infrastructure.
- Experienced in Strong scripting skills in Python and Unix shell.
TECHNICAL SKILLS:
Big Data Technologies: Apache Hadoop, Apache NIFI, Map Reduce, Hive, Pig, HBase, Sqoop, Spark, HDFS, Apache Kafka, Apache oozie, Apache Zookeeper, Autosys.
Hadoop Distributions: Cloudera, Horton Works.
Development Tools: IntelliJ IDEA, Eclipse, Putty
Programming Languages: Scala, Pyspark, Java, HQL
Build Tools: Maven, SBT
Query Tools: AQT (Advance Query Tool)
NOSQL Databases: HBase
Version Control Tools: Git Hub, SVN
Methodologies: Agile(scrum), Waterfall
Databases: MySQL, Oracle.
Operating Systems: Windows 7/10, Linux (Cent OS, Ubuntu), Mac OS
PROFESSIONAL EXPERIENCE:
Confidential, Salt Lake City, Utah
Hadoop Developer
Responsibilities:
- Working in agile, successfully completed stories related to ingestion, transformation and publication of data on time.
- Processed data into HDFS by developing solutions and analyzed the data using Map Reduce and Hive to produce summary results from Hadoop to downstream systems.
- Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries.
- Used shell scripting for the code development.
- Created Managed tables and External tables in Hive and loaded data from HDFS.
- Developed test cases documentation for each task after the development and unit testing is done.
- Developed Spark code by using Scala and Spark-SQL for faster processing and testing and performed complex HiveQL queries on Hive tables.
- Worked on Apache-NiFi data Pipeline to process large set of data and configured Lookup’s for Data Validation and Integrity.
- Designing ETL Data Pipeline flow to ingest the data from RDBMS source to Hadoop using shell script, Sqoop, package and MySQL.
- Optimized the Hive tables using optimization techniques like partitions and bucketing to provide better.
- Strong data modelling and data mapping experience.
- Implemented POC to migrate Map Reduce jobs into Spark RDD transformations using SCALA
- Working with cloud platform to publish the code and deploy changes to AWS dev/uat servers.
- Scheduled jobs in production environment using AUTOSYS Job scheduler.
- Involved in Cluster maintenance, Cluster Monitoring and Troubleshooting, Manage and review data backups and log files.
- Helped in creating the End to End process documentation for couple of projects and make it available for the business.
- Analyzing Hadoop cluster and different Big Data analytic tools including Hive and Sqoop.
- Helped the Team lead and the team in resolving production support issues and making sure that all the jobs ran fine.
- Good working experience on model-based testing tools that generates test inputs or test cases from stored information (AWS Cloud).
- Worked with visual modelling tools like flowcharts, pictures and diagrams.
- Have worked on AWS to integrate the server side and client-side code
Environment: HDFS, Hive, Sqoop, Autosys job Scheduler, AWS, Shell Scripts, HBase, AQT (advance query tool), Git Hub, Apache, Putty, ServiceNow, Cloudera, Spark, Pyspark, Cloud Era, Horton Works.
Confidential, Minneapolis, MN
Hadoop Developer
Responsibilities:
- Configured Spark Streaming to receive real time data from the Apache Kafka and store the stream data to HDFS using Scala.
- Processed data into HDFS by developing solutions and analyzed the data using Map Reduce, PIG, and Hive to produce summary results from Hadoop to downstream systems.
- Written Map Reduce code to process and parsing the data from various sources and storing parsed data into HBase and Hive using HBase-Hive Integration.
- Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries.
- Created Managed tables and External tables in Hive and loaded data from HDFS.
- Developed Spark code by using Scala and Spark-SQL for faster processing and testing and performed complex HiveQL queries on Hive tables.
- Maintained existing data migration program with occasional upgrades and enhancements.
- Scheduled several times based Oozie workflow by developing Python scripts.
- Implemented Sqoop jobs for large sets of structured and semi-structured data migration between HDFS and/or other data storage like Hive.
- Designing ETL Data Pipeline flow to ingest the data from RDBMS source to Hadoop using shell script, Sqoop, package and MySQL.
- Worked on data migration/ETL from Teradata to Hadoop.
- Developed Python scripts to automate data sampling process. Ensured the data integrity by checking for completeness, duplication, accuracy, and consistency
- Optimized the Hive tables using optimization techniques like partitions and bucketing to provide better.
- Worked in a data centric role involving data migration in defining data framework for reporting.
- Involved in Migration of the Hive queries to Impala
- Implemented POC to migrate Map Reduce jobs into Spark RDD transformations using SCALA
- Scheduled map reduces jobs in production environment using Oozie scheduler.
- Involved in Cluster maintenance, Cluster Monitoring and Troubleshooting, Manage and review data backups and log files.
- Experience in building Intermediate database creation scripts, data validation scripts and testing the extracted data
- Worked on AWS for fetching the picture files from AWS to UI.
- Analyzing Hadoop cluster and different Big Data analytic tools including Hive, HBase and Sqoop.
- Created a Python/Django based web application using Python scripting for data processing, MySQL for the database, and HTML/CSS/JQuery and HighCharts for data visualization of the served pages.
Environment: HDFS, Map Reduce, Hive, Apache, Sqoop, AWS, Oozie Scheduler, Shell Scripts, HBase, Cloudera, Kafka, Spark, Scala, Cloud Era, Horton Works.
Confidential, Columbus, OH
Big data/Hadoop Developer
Responsibilities:
- Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
- Developed Scripts and Batch Job to schedule various Hadoop Program.
- Wrote Hive queries for data analysis to meet the business requirements.
- Exploring with Spark improving performance and optimization of the existing algorithms in Hadoop MapReduce using Spark Context, Spark-SQL, Data Frames, Pair RDD's and Spark YARN.
- Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers.
- Developed various automated scripts for DI (Data Ingestion) and DL (Data Loading) using python & java map reduce.
- Used Amazon CloudWatch to monitor and track resources on AWS.
- Knowledge of designing and deployment of Hadoop cluster and different Big Data analytic tools including Hive, HBase, Oozie, Sqoop, Spark.
- Real time streaming of data using Spark with Kafka.
- Responsible for developing data pipeline using Flume, Sqoop and Pig to extract the data from weblogs and store in HDFS.
- Hands on coding - Write and test the code for the Ingest automation process - Full and Incremental Loads. Design the solution and develop the program for data ingestion using - Sqoop, map reduce, Shell script & python
- Created Data Quality Scripts using SQL and Hive to validate successful das ta load and quality of the data. Created various types of data visualizations using Python and Tableau.
- Experienced in implementing Spark RDD transformations, actions to implement business analysis.
- Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
- Loaded data into the cluster from dynamically generated files using Flume and from relational database management systems using Sqoop.
- Utilized Apache Spark with Python to develop and execute Big Data Analytics and Machine learning applications, executed machine Learning use cases under Spark ML and Mllib.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala.
- Involved in importing the real-time data to Hadoop using Kafka and implemented Oozie jobs for daily imports.
- Built various graphs for business decision making using Pythonmatplotlib library.
- Used Python library BeautifulSoup for web scrappingpython to extract data for building graphs.
- Created Hive tables and involved in data loading and writing Hive UDFs.
Environment: Scala, Spark, Spark Streaming, Spark SQL, HDFS, AWS, Apache, Hive, Pig, Linux, Oozie, MapReduce, Apache Kafka, Sqoop, AWS, S3.
Confidential
Hadoop Developer
Responsibilities:
- Provided application demo to the client by designing and developing a search engine, report analysis trends.
- Assisted with the architect to analyze the existing system and future system Prepared design blue pints and application flow documentation.
- Experienced in managing and reviewing Hadoop log files Load and transform large sets of structured, semi-structured and unstructured data.
- Responsible to manage data coming from different sources and application Supported Map Reduce Programs those are running on the cluster.
- Responsible for working with Message broker system such as Kafka Extracted data from mainframes and feed to KAFKA and ingested to HBase to perform Analytics.
- Created MapReduce jobs to extracts the contents from HBase and configured in OOZIE workflow to generate analytical reports.
- Written MapReduce programs to organize the data and ingest the data to suitable for analytics in client specified format.
- Hands on experience in writing python scripts to optimize the performance.
- Extracted files from Cassandra through Sqoop and placed in HDFS and processed.
- Involved in writing spark applications using Scala. Hands on experience in creating RDDs, transformations, and Actions while implementing spark applications.
- Stored the derived the results in HBase from analysis and make it available to data ingestion for SOLR for indexing data.
- Documented all the challenges, issues involved to deal with the security system and Implemented best practices.
- Created Project structures and configurations according to the project architecture and made it available to the junior developer to continue their work.
Environment: PIG, Scala, Kafka HIVE, Map Reduce, Apache, Sqoop Zookeeper, AWS, SVN, windows AD, windows KDC, Hortonworks distribution of Hadoop 2.3.
Confidential
Java Developer
Responsibilities:
- Involved in gathering system requirements for the application and worked with the business team to review the requirements and went through the Software Requirement Specification document and Architecture document.
- Involved in intense User Interface (UI) operations and client-side validations using AJAX toolkit.
- Used SOAP to expose company applications as a Web Service to outside clients.
- Log package is used for the debugging.
- Used Web Services for creating rate summary and used WSDL and SOAP messages for getting insurance plans from the different module and used XML parsers for data retrieval.
- Developed business components and integrated those using Spring features such as Dependency Injection, Auto wiring components such as DAO layers and service proxy layers.
- Used Spring AOP to implement Distributed declarative transaction throughout the application.
- Wrote Hibernate configuration XML files to manage data persistence.
- Worked on Delete printer module using python.
- Extensively worked on Python & Rest API
- Used TOAD to generate SQL queries for the applications, and to see the reports from log tables.
- Involved in the migration of Data from Excel, Flat file, Oracle, XML files to SQL Server by using BCP and DTS utility.
Environment: Java/J2EE, HTML, Axis, Servlets, Web services, Apache, Restful Web Services, Spring, DB2, RAD, Rational Clear case, AWS, WCF, AJAX.
