We provide IT Staff Augmentation Services!

Hadoop Developer Resume

3.00/5 (Submit Your Rating)

Troy Hills, NJ

SUMMARY

  • Have 7 years of experience in Analysis, Design, Development, Testing, Implementation, Maintenance and Enhancements on various IT Projects and experience in BigDatain implementing end - to-endHadoop solutions.
  • Experience in working in environments using Agile (SCRUM) and Test-Driven development methodologies.
  • Hands on experience in developing and deploying enterprise-based applications using major components in Hadoop ecosystem like Hadoop 2.x, HBase, Hive, Pig, Scala, Airflow, Snowflake, Spark, Impala, Uzi, YARN HBase, Map Reduce, Kafka, Solr, HDFS.
  • Experience in developing Map Reduce Programs using Apache Hadoop for analyzing the big data according to the requirement.
  • Proven Expertise in performing analytics on Big Data using Map Reduce, Hive and Pig.
  • Extensively used SQL, NumPy, Pandas, SparkML, Hive for Data Analysis and Model building.
  • Experience with traditional ETL technologies, Hadoop Sqoop and Hadoop Flume.
  • Experience in analyzing, designing, and developing ETL Strategies and processes, writing ETL specifications.
  • Experience in using Talend Data Integration/ Talend Data Quality.
  • Sound knowledge in programming Spark using Scala/Python.
  • Experienced with Spark processing framework such as Spark SQL, andData Warehousing and ETL processes
  • Hands on expertise with AWS Databases such as RDS(Aurora), Redshift, DynamoDB and Elastic Cache (Memcached & Redis)
  • Developed end to end ETL pipeline using Spark-SQL, Scala on Spark engine and imported data from AWS S3 into Spark RDD, performed transformations and actions on RDDs
  • Experience on Migrating SQL database to Azure data Lake, Azure data lake Analytics, Azure SQL Database, DataBricks and Azure SQL Data warehouse and Controlling and granting database access and Migrating On premise databases to Azure Data lake store using Azure Data factory.
  • Experience with Spark Streaming and to write Spark jobs
  • Worked on Kubernetes to orchestrate Docker containers of new and existing applications as well as deployment and management of complex run time environment.
  • Firm grip on data modeling, database performance tuning and NoSQL map-reduce systems.
  • Experience in NoSQL databases: MongoDB, HBase, Cassandra.
  • Experience with UI design / UX design for complex SaaS or cloud-based applications.
  • Good experience in building pipelines using Azure Data Factory and moving the data into Azure Data Lake Store.
  • Developed applications using Java, RDBMS, and Linux shell scripting.
  • Implement Splunk solutions in highly available, redundant, distributed computing environments.
  • Experience in JAVA, J2EE, Web Services, HTML and XML related technologies demonstrating strong analytical and problem-solving skills, computer proficiency and ability to follow through with projects from inception to completion.
  • Experience in Gitlab CI and Jenkins for CI and for end-to-end automation for all build and CD. Proficient in documenting and implementing procedures related to build deployment and release.
  • Extensive experience working in Oracle, DB2, SQL Server and My SQL database and Java Core concepts-OOP, Multi-threading, Collections, and IO.
  • Worked and learned a great deal from Amazon Web Services (AWS) Cloud services like EC2, S3, ECS, EMR, RDS and VPC.
  • Built large-scale data processing pipelines and data storage platforms using open-source big data technologies.
  • Experience in processing semi-structured and unstructured datasets.
  • Good understanding of Data Mining and Machine Learning techniques.
  • Experience in complete project life cycle of Client Server and Web applications.
  • Have good interpersonal, communicational skills, strong problem-solving skills, explore/adopt to new technologies with ease and a good team member.

TECHNICAL SKILLS

Big Data/Hadoop Framework: HDFS, MapReduce, Pig, Hive, Sqoop, Flume, HBase, Spark, Solr.

Databases: MongoDB, NoSQL, MySQL, Cassandra.

Programming Languages: Scala, Java, Python, R, SQL.

Web Technologies: JSP & Servlets, PHP, XML, HTML, Python.

Operating Systems: Windows, Unix and Linux, Mac OS.

Front-end: HTML/HTML 5, CSS3.

Development Tools: Microsoft SQL Studio, Eclipse, MySQL Workbench, Tableau.

Visualization Tools: Tableau, Tensor Flow.

Office Tools: Microsoft Office Suite.

Development Methodologies: Agile/Scrum, Design Patterns.

PROFESSIONAL EXPERIENCE

Hadoop Developer

Confidential | Troy Hills, NJ

Responsibilities:

  • Installation and Configuration of Hadoop Cluster
  • Working with Cloudera Support Team to Fine Tune Cluster
  • Working Closely with SA Team to make sure all hardware and software is properly setup for Optimum usage of resources.
  • Developed a custom File System plugin for Hadoop so it can access files on Hitachi Data Platform.
  • Plugin allows Hadoop MapReduce programs, HBase, Pig and Hive to work unmodified and access files directly.
  • Played a key role in installation and configuration of the various Hadoop ecosystem tools such as Solr, Kafka, Pig, HBase and Cassandra.
  • Developed Spark code by using Scala and Spark-SQL for faster processing and testing and performed complex HiveQL queries on Hive tables.
  • Designing ETLDataPipeline flow to ingest thedatafrom RDBMS source toHadoopusing shell script, Sqoop, package and MySQL.
  • Imported, exported data from various databases ORACLE, and MYSQL into HDFS using Talend.
  • Since this is Migration project, we are doing migrating from data stage to Talend Using Big Data Components.
  • The plugin also provided data locality for Hadoop across host nodes and virtual machines
  • Wrote data ingesters and map reduce program.
  • Developed map Reduce jobs to analyze data and provide heuristics reports
  • Good experience in writing data ingesters and complex MapReduce jobs in java for data cleaning and preprocessing and fine tuning them as per data set.
  • Extensive data validation using HIVE and written Hive UDF.
  • Involved in creating Hive tables loading with data and writing hive queries which will run internally in map reduce.
  • Created ETL/Talend jobs both design and code to process data to target databases.
  • Moved data from HDFS to Cassandra using Map Reduce and Bulk Output Format class.
  • Involved in loading data from UNIX file system to HDFS and created custom Solr Query components to enable optimum search matching.
  • Experienced with different scripting language like Python and shell scripts.
  • Adding, Decommissioning and rebalancing node.
  • Worked on HBase Java API to populate operational HBase table with Key value.
  • Developed multiple MapReduce jobs in java for data cleaning and pre-processing.
  • Developing and running Map-Reduce jobs on YARN and Hadoop clusters to produce daily and monthly reports as per user's need.
  • Created data pipelines for different events to load the data from DynamoDB to AWS S3 bucket and then into HDFS location.
  • Performed data quality issue analysis using Snow SQL by building analytical warehouses on Snowflake.
  • Developed Splunk infrastructure and related solutions as per automation toolsets.
  • Applying Patches and Perform Version Upgrades.
  • Incident Management, Problem Management and Change Management.
  • Involved in the code migration of quality monitoring tool from AWS EC2 to AWS Lambda and built logical datasets to administer quality monitoring on Snowflake warehouses.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
  • Automating and scheduling the Sqoop jobs in a timely manner using Unix Shell Scripts.
  • Schedule Map Reduce Jobs - FIFO and FAIR share
  • Installation and Configuration of other OpenSource Software like Pig, Hive, HBASE, Flume and Sqoop
  • Troubleshooting, debugging & altering Talend issues, while maintaining the health and performance of the ETL environment.
  • Designing and implementing CI (Continuous Integration) system: configuringJenkins servers, Jenkins nodes, creating CI/CD required scripts and creating/configuring VMs (Windows/Linux)
  • Integration with RDBMS using Sqoop and JDBC Connector
  • Working with Dev Team to tune Job Knowledge of Writing Hive Jobs.
  • Continuously monitored and managed the Hadoop Cluster using Cloudera Manager.

Environment: Windows 2000/ 2003 UNIX, Linux, Java, Apache HDFS Map Reduce, Spark SQL, Scala, Avro, Storm, Cloudera Pig, Hive HBase, Flume, Sqoop, Cassandra, NOSQL.

Hadoop Developer

Confidential | New York, NY

Responsibilities:

  • Worked on analysing Hadoop cluster and different big data analytical and processing tools including Pig, Hive, Sqoop, python and Spark with Scala & java, Spark Streaming
  • Wrote Spark-Streaming applications to consume the data from Kafka topics and wrote processed streams to HBase and steamed data using Spark with Kafka
  • Worked on the large-scale Hadoop YARN cluster for distributed data processing and analysis using Spark, Hive, and MongoDB
  • Involved in creating data-lake by extracting customer's data from various data sources to HDFS which include data from Excel, databases, and log data from servers
  • Created Talend custom components for the various use cases and worked on XML components, Data Quality, Processing and Log & Error components.
  • Developed Apache Spark applications by using Scala and python for data processing from various streaming sources
  • Used Scala to convert Hive/SQL queries into RDD transformations in Apache Spark
  • Implemented Spark solutions to generate reports, fetch and load data in Cassandra
  • Responsible for estimating the cluster size, monitoring and troubleshooting of the Spark DataBricks cluster.
  • Installed and configured Apache Airflow for workflow management and created workflows in python.
  • Created Talend ETL jobs to receive attachment files from pop Email using tPop, tFileList and tFileInputMail and then loaded data from attachments into database and archived the files.
  • Created Complex ETL Packages using SSIS to extract data from staging tables to partitioned tables with incremental load. Also, Created SSIS Reusable Packages to extract data from Multi formatted Flat files, Excel, XML files into UL Database and DB2 Billing Systems.
  • Profound involvement in building ETL pipelines between a few source frameworks and Enterprise Data Warehouse by utilizing Informatica PowerCenter, SSIS, SSAS and SSRS.
  • Designed, developed, and implemented complex SSIS packages, asynchronous ETL processing, Ad hoc reporting, and SSRS report server, and data mining in SSAS.
  • Experienced in writing real-time processing and core jobs using Spark Streaming with Kafka as a data pipeline system
  • Written HiveQL to analyse the number of unique visitors and their visit information such as views, most visited pages, etc
  • Configured Spark Streaming to receive real time data from the Apache Kafka and store the stream data to HDFS using Scala and Python
  • Developed python code for different tasks, dependencies, SLA watcher and time sensor for each job for workflow management and automation using Airflow tool.
  • Having experienced in Agile Methodologies, Scrum stories and sprints experience in a Python based environment, along with data analytics, data wrangling and Excel data extracts
  • Created the AWS VPC network for the Installed Instances and configured the Security Groups and Elastic IP's accordingly
  • Worked on setting up Rancher orchestrator, to manage Kubernetes everywhere it runs, worked with Rancher CLI.
  • Experienced on working with Amazon EMR framework for processing data on EMR and EC2 instances
  • Managing messages on Kafka topics using Talend Jobs.
  • Designing and implementing complete end-to-end Hadoop Infrastructure including Pig, Hive, Sqoop, Oozie, Flume, and Zookeeper
  • Further used pig to do transformations, event joins, elephant bird API and pre -aggregations performed before loading JSON files format onto HDFS
  • Testing the processed data through various test cases to meet the business requirements
  • Extract Real time feed using Kafka and Spark Streaming and convert it to RDD and process data in the form of Data Frame and save the data as Parquet format in HDFS

Environment: AWS, Ambari, Hive, Python, HBase, Spark, Talend, Scala, Map Reduce, HDFS, Sqoop, Impala, Linux, Shell scripting, Tableau.

Hadoop Developer

Confidential | Detroit, MI

Responsibilities:

  • Worked on Sqoop for Import/Export data into HDFS and Hive.
  • Experience with Pig program for loading and filtering the streaming data into HDFS using Flume.
  • Moving large amount data into HBase using MapReduce Integration.
  • Experienced with MapReduce programs to clean and aggregate the data.
  • Developed HBase data model on top of HDFS data to perform real time analytics using Java API.
  • Worked on different kind of custom filters and handled pre-defined filters on HBase using API.
  • Developed counters on HBase data to count total records on different tables.
  • Experienced in handling Avro data files by passing schema into HDFS using Avro tools and Map Reduce.
  • Implemented secondary sorting to sort reducer output globally in MapReduce.
  • Implemented data pipeline by chaining multiple mappers by using Chained Mapper.
  • Created Hive Dynamic partitions to load time series data.
  • Experienced in handling different types of joins in Hive like Map joins, bucker map joins, sorted bucket map joins.
  • Experienced import/export data into HDFS/Hive from relational data base and Tera data using Sqoop.
  • Handling continuous streaming data comes from different sources using Flume and set destination as HDFS.
  • Integrated spring schedulers with Oozie client as beans to handle Cron jobs.
  • Actively participated in software development lifecycle including design and code reviews.
  • Involved in story-driven agile development methodology and actively participated in daily scrum meetings.

Environment: HDFS, Map Reduce, Hive, Pig, HBase, Sqoop, RDBMS/DB, Flat files, MySQL, CSV, Avro.

Hadoop Application Developer

Confidential

Responsibilities:

  • Built APIs that will allow customer service representatives to access the data and answer queries.
  • Designed changes to transform current HADOOP jobs to HBase.
  • Handled fixing of defects efficiently and worked with the QA and BA team for clarifications
  • Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Manage and review data backups, Manage & review log files
  • Extending the functionality of Hive and Pig with custom UDF s and UDAF's
  • The new Business Data Warehouse (BDW) improved query/report performance, reduced the time needed to develop reports and established self-service reporting model in Cognos for business users
  • Implemented Bucketing and Partitioning using Hive to assist the users with data analysis
  • Used Oozie scripts for deployment of the application and perforce as the secure versioning software.
  • Implemented Partitioning, Dynamic Partitions, Buckets in HIVE
  • Develop database management systems for easy access, storage, and retrieval of data
  • Perform DB activities such as indexing, performance tuning, and backup and restore
  • Expertise in writing HADOOP Jobs for analyzing data using Hive QL (Queries), Pig Latin (Data flow language), and custom MapReduce programs in Java
  • Did various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in the Hive and Map Side joins
  • Expert in creating PIG and Hive UDFs using Java to analyze the data efficiently
  • Responsible for loading the data from BDW Oracle database, Teradata into HDFS using Sqoop
  • Implemented AJAX, JSON, and Java script to create interactive web screens
  • Wrote data ingestion systems to pull data from traditional RDBMS platforms such as Oracle and Teradata and store it in NoSQL databases such as MongoDB
  • Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts

Environment: AWS, Hadoop, Pig, Hive, MapReduce, HDFS, Sqoop, Impala, Tableau, Oozie, Linux.

Java Developer

Confidential

Responsibilities:

  • Involved in intense User Interface (UI) operations and client-side validations using AJAX toolkit.
  • Used SOAP to expose company applications as a Web Service to outside clients.
  • Log package is used for the debugging.
  • Used Web Services for creating rate summary and used WSDL and SOAP messages for getting insurance plans from the different module and used XML parsers fordataretrieval.
  • Developed business components and integrated those using Spring features such as Dependency Injection, Auto wiring components such as DAO layers and service proxy layers.
  • Used Spring AOP to implement Distributed declarative transaction throughout the application.
  • Wrote Hibernate configuration XML files to managedatapersistence.
  • Used TOAD to generate SQL queries for the applications, and to see the reports from log tables.
  • Involved in the migration ofDatafrom Excel, Flat file, Oracle, XML files to SQL Server by using BCP and DTS utility.

Environment: Java/J2EE, HTML, Axis, Servlets, Web services, Apache, Restful Web Services, Spring, DB2, RAD, Rational Clear case, AWS, WCF, AJAX.

We'd love your feedback!