Hadoop Developer Resume
Jessup, PA
SUMMARY:
- Over 7+ years of extensive hands on experience with Hadoop Ecosystem stack including HDFS,
- MapReduce, Sqoop, Hive, Pig, HBase, Oozie, Flume, Kafka, Zookeeper, and Spark.
- Experience in different Hadoop distributions like Cloudera and Horton Works Distributions (HDP).
- Comfortable working with various facets of the Hadoop ecosystem, real - time or batch, Structured or Unstructured data processing.
- Experience with NoSQL databases like HBase as well as other ecosystems like Zookeeper, Oozie, Impala, Storm, Spark- Streaming/SQL, Kafka, Flume.
- Expertise skills in handling analytics projects using Big Data technologies.
- Hands on experience in ingesting data from external servers to Hadoop.
- Experience in moving large amounts of log, streaming event data and Transactional data using Flume.
- Hands on experience developing workflows that execute Sqoop, Pig, Hive and Shell scripts using Oozie.
- Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
- Good experience with Hive Data Warehousing concepts like Static/Dynamic Partitioning, Bucketing, Managed, and External Tables, join operations on tables.
- Proficient in building user defined functions (UDF) in Hive and Pig, to analyze data and extended HiveQL and Pig Latin Functionality.
- Experience in working with Spark transformations and actions on RDDs and Spark-SQL, Data Frames in Python.
- Experience in implementing unified data ingestion platform using Kafka producers and consumers.
- Experience in implementing near real-time event processing and analytics using Spark Streaming
- Proficient with Flume topologies for data ingestion from streaming sources into Hadoop.
- Well versed with major Hadoop distributions: Cloudera and HortonWorks
- Having experience on Eclipse, NetBeans IDEs.
- Ability to adapt to evolving Technology, Strong sense of responsibility and Accomplishment.
- Has very good development experience with Agile Methodology.
- Strong experience in distinct phases of Software Development Life cycle (SDLC) including Planning, Design, Development and Testing during the development of software applications.
- Excellent leadership, interpersonal, problem solving and time management skills.
- Excellent communication skills both written (documentation) and verbal (presentation).
- Very responsible and good team player. Can work independently with minimal supervision
PROFESSIONAL EXPERIENCE:
Hadoop Developer
Confidential - Jessup, PA
Responsibilities:
- Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters with agile methodology.
- Monitored multiple Hadoop clusters environments using Ganglia, monitored workload, job performance and capacity planning using Cloudera Manager.
- Experienced with through hands - on experience in all Hadoop, Java, SQL and Python.
- Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
- Participated in functional reviews, test specifications and documentation review
- Performed MapReduce programs on log data to transform into structured way to find user location, age group, spending time.
- Analyzed the web log data using the HiveQL to extract number of unique visitors per day, page views, visit duration, most purchased product on website.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports by Business Intelligence tools.
- Proactively monitored systems and services, architecture design and implementation of Hadoop deployment, configuration management, backup, and disaster recovery systems and procedures.
- Documented the systems processes and procedures for future references, responsible to manage data coming from different sources.
Environment: Hadoop, HDFS, Map Reduce, Flume, Pig, Sqoop, Hive, Pig, Sqoop, Oozie, Ganglia, HBase, Shell Scripting.
Hadoop Developer
Confidential - Pittsburgh, PA
Responsibilities:
- Worked on Spark SQL to handle structured data in Hive.
- Involved in making Hive tables, stacking information, composing hive inquiries, producing segments and basins for enhancement.
- Involved in migrating tables from RDBMS into Hive tables using SQOOP and later generate visualizations using Tableau.
- Worked on complex MapReduce program to analyses data that exists on the cluster.
- Analysed substantial data sets by running Hive queries and Pig scripts.
- Written Hive UDFs to sort Structure fields and return complex data type.
- Worked in AWS environment for development and deployment of custom Hadoop applications.
- Involved in creating Shell scripts to simplify the execution of all other scripts (Pig, Hive, Sqoop, Impala and MapReduce) and move the data inside and outside of HDFS.
- Creating files and tuned the SQL queries in Hive utilizing HUE.
- Involved in collecting and aggregating large amounts of log data using Storm and staging data in HDFS for further analysis.
- Created the Hive external tables using Accumulo connector.
- Managed real time data processing and real time Data Ingestion in MongoDB and Hive using Storm.
- Created custom SOLR Query segments to optimize ideal search matching.
- Developed Spark scripts by using Python shell commands.
- Stored the processed results In Data Warehouse, and maintaining data using Hive.
- Worked with Spark eco system using Spark SQL and Scala queries on different formats like Text file, CSV file.
- Created Oozie workflow and Coordinator jobs to kick off the jobs on time for data availability.
- Worked with NoSQL databases like MongoDB in making MongoDB tables to load expansive arrangements of semi structured data.
- Developed Spark scripts by using Python shell commands as per the requirement.
- Installed Oozie workflow engine to run multiple Hive and Pig jobs, which run independently with time and data availability.
- Worked and learned a great deal from Amazon Web Services (AWS) Cloud services like EC2, S3, and EMR.
Environment: HDFS, MapReduce, Storm, Hive, Pig, Sqoop, MongoDB, Apache Spark, Python, Accumulo, Oozie Scheduler, Kerberos, AWS, Tableau, Java, UNIX Shell scripts, HUE, SOLR, GIT, Maven.
Hadoop Consultant
Confidential - Phoenix, AZ
Responsibilities:
- Automation of data pulls into HDFS from MySQL server and Oracle DB using Sqoop.
- Analyzing source data tables for best possible loading strategies.
- Involved in various stages of this project like planning, estimation the hardware and software, installing (SDLC).
- Develop Shell scripts to perform various ETL jobs like creating staging and final tables.
- Implemented 2 level staging process for Data Validation.
- Extracted data from staging tables and analyzed data using Impala.
- Implement ad - hoc queries using Impala, create tables with partitioning and bucketing to load data.
- Created a Spark application to process and stream data from Kafka to MySQL.
- Implement Hive Incremental updates using four-step strategy to load incremental data from RDBM systems.
- Implement, configure optimization techniques like Bucketing, Partitioning and File Formats.
- Used Spark to analyze data in HIVE, HBase and HDFS.
- Involved in Hadoop Cluster Administration that includes adding and removing Cluster Nodes, Cluster Capacity Planning, and Performance Tuning.
- Worked on Hadoop clusters capacity Planning and Management.
- Monitoring and Debugging Hadoop jobs Applications running in production.
- Written a PIG Scripts to read data from HDFS and write into Hive Table.
- Experience of performance tuning Hive ETL Scripts, Pig Scripts, MR Jobs in production environment by altering job parameters.
- Providing various hourly/weekly/monthly aggregation reports required by clients through Spark.
- Worked on data processing part mainly to make the Unstructured Data to Semi-Structured Data and loaded into Hive tables, HBase tables and integration.
- Load log data into HDFS using Flume.
- Written the Apache PIG scripts to process the HDFS Data.
- Developed Spark SQL scripts with Python for analysis and Demo purposes.
Environment: MapReduce, Spark, HDFS, Pig, HBase, Oozie, Zookeeper, Sqoop, Linux, Kafka, Hadoop, Maven, NoSQL, MySQL, Hive, Java, Eclipse, Python.
Hadoop Developer
Confidential
Responsibilities:•
- Worked on Hadoop Ecosystem using different big data analytic tools including Hive, Pig.
- Involved in loading data from LINUX file system to HDFS.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Implemented Partitioning, Bucketing in Hive.
- Worked on different file formats (ORCFILE, TEXTFILE) and different Compression Codecs (GZIP, SNAPPY, LZO).
- Worked with multiple Input Formats such as Text File, Key Value, and Sequence File Input Format.
- Experienced in running Hadoop Streaming jobs to process terabytes of Json format data.
- Involved in scheduling Oozie workflow engine to run multiple Hive and Pig jobs
- Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business requirements.
- Created HBase tables to store various data formats of incoming data from different portfolios.
- Created Pig Latin scripts to sort, group, join and filter the enterprise wise data.
- Developed the verification and control process for daily load.
- Experience in Daily production support to monitor and trouble shoots Hadoop/Hive jobs.
- Worked collaboratively with different teams to smoothly slide the project to production.
Environment: HDFS, Pig, Hive, Sqoop, Shell Scripting, HBase, Zoo Keeper, MySQL.
Software Developer
Confidential
Responsibilities:
- Performed analysis for the client requirements based on the developed detailed design documents.
- Developed Use Cases, Class Diagrams, Sequence Diagrams and Data Models using Microsoft Visio.
- Developed STRUTS forms and actions for validation of user request data and application functionality.
- Developed a web service using SOAP, WSDL, XML and SOAP - UI.
- Developed JSP's with STRUTS custom tags and implemented JavaScript validation of data.
- Involved in developing business tier using stateless session bean.
- Used JavaScript for the web page validation and Struts Valuator for server side validation
- Designing the database and coding of SQL, PL/SQL, Triggers and Views using IBMDB2.
- Design patterns of Delegates, Data Transfer Objects and Data Access Objects.
- Developed Message Driven Beans for asynchronous processing of alerts.
- Used Clear case for source code control and JUnit for unit testing.
- The networks are simulated in real-time using an ns3 network simulator modified for multithreading across multiple cores, which is implemented on generic Linux machine.
- Involved in peer code reviews and performed integration testing of the modules.
Environment: STRUTS, JSP's with STRUTS, JDBC, Struts Valuator, SQL, PL/SQL, IBMDB2, JUNIT, Java / J2ee, JSP, servlets, EJB 2.0, SQL Server, Oracle 9i, Jboss & Web Logic Server 6, JavaScript.
TECHNICAL SKILLS
Programming Languages: C, C++, Java (core), J2EE, UNIX Shell Scripting, Python
Web Languages: HTML, JAVA SCRIPT, CSS
Hadoop Ecosystem: MapReduce, HBASE, HIVE, PIG, SQOOP, Zookeeper, OOZIE, Flume, HUE, Kafka, AWS EMR, SPARK, SPARK-SQL
Database Languages: MySQL, NOSQL
Database: Oracle, SQL
Virtualization & Cloud Tools: Amazon AWS, VMware, Virtualbox
Visualization tools: Power Bi, Tableau
Web/Application: Servers Apache Tomcat
Version Control Tools: GIT and SVN
Operating Systems: Windows, Linux (Ubuntu, Red Hat, Cent OS)
IDE Platforms: Eclipse, Net Beans, Visual Studio
Methodologies: Agile, SDLC