We provide IT Staff Augmentation Services!

Sr Big Data/ Hadoop Developer Resume

3.00/5 (Submit Your Rating)

Wilmington, DE

PROFESSIONAL SUMMARY:

  • Over 7+ years of extensive hands - on experience in Hadoop / Big Data, Java/J2EE & Python technologies and in various IT related technologies.
  • Expertise with teh tools in Hadoop Ecosystem including Pig, Hive, HDFS, MapReduce, Sqoop, Spark, Kafka, Yarn, Oozie, and Zookeeper.
  • Excellent programming skills at higher level of abstraction using SCALA and JAVA
  • Rich working experience in data loading in hive tables and writing hive queries using join, order by, group by etc., by Sqoop data from RDBMS.
  • Experience in designing and developing applications in Spark using Scala to compare teh performance of Spark with Hive and SQL/Oracle.
  • Experience in manipulating/analyzing large datasets and finding patterns and insights within structured and unstructured data.
  • Strong experience on Hadoop distributions like Cloudera, MapR and Hortonworks.
  • Good understanding of NoSQL databases and hands on work experience in writing applications on NoSQL databases like HBase, Cassandra and MongoDB.
  • Apache Spark concepts with Scala, writing transformations in Scala for live streaming data. Click stream analysis using Spark with Scala involving data gathering from Kafka, Flume.
  • Experienced in writing complex MapReduce programs dat work with different file formats like Text, Sequence, Xml, Apache parquet and Avro.
  • Experience in Oozie and workflow scheduler to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions with control flows.
  • Experience in migrating teh data using Sqoop from HDFS to Relational Database System and vice-versa according to client's requirement.
  • Extensive Experience on importing and exporting data using stream processing platforms like Flume and Kafka.
  • Written Scala codes for data analytics in Spark using MapReduce, ByKey, group ByKey etc. to analyze teh real time streaming data.
  • Very good experience in complete project life cycle (design, development, testing and implementation) of Client Server and Web applications.
  • Excellent Python, Java development skills using J2EE, J2SE, Servlets, JSP, EJB, JDBC, SOAP and RESTful web services.
  • Experience in database design using PL/SQL to write Stored Procedures, Functions, Triggers and strong experience in writing complex queries for Oracle.
  • Experienced in working with Amazon Web Services (AWS) using EC2 for computing and S3 as storage mechanism.
  • Strong experience in Object-Oriented Design, Analysis, Development, Testing and Maintenance.
  • Excellent implementation knowledge of Enterprise/Web/Client Server using Python, Java/J2EE.
  • Experienced in using agile approaches, including Extreme Programming, Test-Driven Development and Agile Scrum.
  • Experience in using various IDEs Eclipse, Intellij and repositories SVN and Git.
  • Experience of using build tools Ant, Maven.
  • Strong knowledge of Spark for handling large data processing in streaming process along with Scala.
  • Experience in designing a component using UML Design-Use Case, Class, Sequence, and Development, Component diagrams for teh requirements.

TECHNICAL SKILLS:

Hadoop/Big Data: HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Oozie, Spark, Kafka, Storm and Zookeeper.

Languages: Java, Python, Scala, J2EE, PL/SQL, Pig Latin, HiveQL, Unix shell scripts

Frameworks: MVC, Struts, Spring, Hibernate

NoSQL Databases: HBase, Cassandra, MongoDB

Cloud: AWS, Azure.

Operating Systems: HP-UNIX, RedHat Linux, Ubuntu Linux and Windows XP/Vista/7/8

Web/Application servers: Apache Tomcat, WebLogic, JBoss.

Databases: Oracle, DB2, SQL Server, MySQL, Teradata

Tools and IDE: Eclipse, NetBeans, Toad, Maven, ANT, Hudson, Sonar, JDeveloper, Assent PMD, DB Visualizer

Version control: SVN, CVS, GIT

Web Services: REST, SOAP

PROFESSIONAL EXPERIENCE:

Confidential, Wilmington, DE

Sr Big Data/ Hadoop Developer

Responsibilities:

  • Worked as a Sr. Big Data/Hadoop Developer with Hadoop Ecosystems components like HBase, Sqoop, Zookeeper, Oozie, Hive and Pig with Cloudera Hadoop distribution.
  • Involved in Agile development methodology active member in scrum meetings.
  • Worked in Azure environment for development of Custom Hadoop Applications.
  • Designed and implemented scalable Cloud Data and Analytical architecture solutions for various public and private cloud platforms using Azure.
  • Involved in start to end process of Hadoop jobs dat used various technologies such as Sqoop, PIG, Hive, MapReduce, Spark and Shells scripts.
  • Implemented various Azure platforms such as Azure SQL Database, Azure SQL Data Warehouse, Azure Analysis Services, HDInsight, Azure Data Lake and Data Factory.
  • Extracted and loaded data into Data Lake environment (MS Azure) by using Sqoop which was accessed by business users.
  • Manage and support of enterprise Data Warehouse operation, big data advanced predictive application development using Cloudera & Hortonworks HDP.
  • Developed PIG scripts to transform teh raw data into intelligent data as specified by business users.
  • Utilized Apache Spark with Python to develop and execute Big Data Analytics and Machine learning applications, executed machine learning use cases under Spark ML and MLlib.
  • Installed Hadoop, Map Reduce, HDFS, Azure to develop multiple MapReduce jobs in PIG and Hive for data cleansing and pre-processing.
  • Used Spark API over Hortonworks Hadoop YARN to perform analytics on data in Hive.
  • Improved teh performance and optimization of teh existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Developed a Spark job in Python which indexes data into Elastic Search from external Hive tables which are in HDFS.
  • Performed transformations, cleaning and filtering on imported data using Hive, MapReduce, and loaded final data into HDFS.
  • Explored with teh Spark improving teh performance and optimization of teh existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Import teh data from different sources like HDFS/HBase into Spark RDD and developed a data pipeline using Kafka and Storm to store data into HDFS.
  • Used Spark streaming to receive real time data from teh Kafka and store teh stream data to HDFS using Scala and NoSQL databases such as HBase and Cassandra.
  • Documented teh requirements including teh available code which should be implemented using Spark, Hive, HDFS, HBase and Elastic Search.
  • Performed transformations like event joins, filter boot traffic and some pre-aggregations using Pig.
  • Explored MLlib algorithms in Spark to understand teh possible Machine Learning functionalities dat can be used for our use case
  • Used windows Azure SQL reporting services to create reports with tables, charts and maps.
  • Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet teh business requirements.
  • Configured Oozie workflow to run multiple Hive and Pig jobs which run independently with time and data availability.
  • Imported and exported teh analyzed data to teh relational databases using Sqoop for visualization and to generate reports for teh BI team.

Environment: Hadoop 3.0, Azure, Sqoop 1.4.6, PIG 0.17, Hive 2.3, MapReduce, Spark 2.2.1, Shells scripts, SQL, Hortonworks, Python, MLlib, HDFS, YARN, Python, Kafka 1.0, Cassandra 3.11, Oozie, Agile

Confidential, Denver, CO

Sr Big Data/Hadoop Developer:

Responsibilities:

  • Worked as a Big/Hadoop Developer for providing solutions for big data problem.
  • Worked in Agile development environment in sprint cycles of two weeks by dividing and organizing tasks. Participated in daily scrum and other design related meetings.
  • Design, Architect, and help Maintain scalable solutions on teh big data analytics platform for enterprise module.
  • Created and maintained technical documentation for launching Hadoop clusters and for executing Hive queries and Pig Scripts.
  • Created real time data ingestion of structured and unstructured data using Kafka and Spark streaming to Hadoop and MemSQL.
  • Populate teh data into dimensions and fact tables, efficiently involved in creating Talend Mappings.
  • Started using Apache Nifi to copy teh data from local file system to HDP.
  • Imported data from AWS S3 and into Spark RDD and performed transformations and actions on RDD's.
  • Migrated physical data center environment to AWS also designed, built, and deployed a multitude application utilizing almost all of teh AWS stack (EC2, S3, RDS)
  • Implement solutions for ingesting data from various sources and processing teh Data utilizing Big Data Technologies.
  • Use Input and Output data as delimited files into HDFS using Talend Big data studio with different Hadoop Component.
  • Developed Scala scripts, UDFs using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
  • Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
  • Create a table inside RDBMS, insert some data after load teh same table into HDFS, Hive using Sqoop.
  • Work with Business stakeholder and translate Business objectives, requirements into technical requirements and design.
  • Defined teh application architecture and design for Big Data Hadoop initiative to maintain structured and unstructured data; create reference architecture for teh enterprise.
  • Identify data sources, create source-to-target mapping, storage estimation, provide support for Hadoop cluster setup, data partitioning.
  • Developed scripts for data ingestion using Sqoop and Flume, Spark SQL and Hive queries for analyzing teh data, and Performance optimization
  • Responsible for developing data pipeline with Amazon AWS to extract teh data from weblogs and store in Amazon EMR.
  • Wrote DDL and DML files to create and manipulate tables in teh database
  • Developed teh Unix shell/Python scripts for creating teh reports from Hive data.
  • Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms
  • Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
  • Analyzed data using Hadoop components Hive and Pig and created tables in hive for teh end users
  • Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Automated teh migration of Subversion (SVN) repositories to Git while preserving teh commit history and other metadata like branches, tags and authors

Environment: Agile, Hive 2.3, Pig 0.17, Kafka, Spark, Apache Nifi, AWS, HDFS, Scala, Zookeeper, Sqoop, HBase, Sqoop, Spark SQL, Amazon EMR, Apache Flume, Git, SVN

Confidential, Bloomington, IL

Big Data/Hadoop Developer

Responsibilities:

  • Experience in supporting and managing Hadoop Clusters using Hortonworks distributions by deploying it on AWS cloud.
  • Collected aggregated large amount of web log data from different sources such as web servers, mobile and network devices using Apache Kafka.
  • Ingestion framework was developed in python Big Data technologies with data stores such as DynamoDB, Cassandra.
  • Creating teh RDD’s, Data frames for faster execution and performing data transformations and actions using Spark.
  • Developed optimal strategies for distributing teh web log data over teh cluster.
  • Implemented Hive Generic UDF's to in corporate business logic into Hive Queries.
  • Configuring Spark Streaming to receive real time data from teh Kafka for high speed data processing and Store teh stream data to HDFS.
  • Used Scala to read text data, CSV data, image data from HDFS, S3 and Hive
  • Worked on Spark SQL for faster execution of Hive queries using Spark SQL Context.
  • Implemented complex big data with a focus on collecting, parsing, managing, analyzing, and visualizing large sets of data to turn information into business insights using multiple platforms in teh Hadoop ecosystem.
  • Teh developed workflow in Oozie to automate teh tasks of loading teh data into HDFS and pre-processing with Pig.
  • Involved in source system analysis, data analysis, and data modeling to ETL (Extract, Transform and Load).
  • Written Spark programs to model data for extraction, transformation, and aggregation from multiple file-formats including XML, JSON, CSV& other compressed file formats.
  • Imported data from teh structured data source into HDFS using Sqoop incremental imports.
  • Created Hive tables, partitions and implemented incremental imports to perform ad-hoc queries on structured data.
  • Build Hive tables using list partitioning and hash partitioning and created Hive Generic UDF's to process business logic with HiveQL.
  • Developed SQL scripts using Spark for handling different data sets and verifying teh performance over Map Reduce jobs.
  • Supported MapReduce Programs dat are running on teh cluster and Wrote MapReduce jobs using Python API.
  • Designed unit test Data models and applications for data analytics solutions on streaming data
  • Experience with centralized version control system such as Subversion (SVN) and distributed version control system such as Git

Environment: Hortonworks, HDFS, Hive, Sqoop, Oozie, Storm, Scala 2.11.8, Spark 2.0, Spark SQL, Spark streaming, Python, Kafka, GitHub, Kerberos, AWS, Amazon S3, Amazon EC2, Amazon EBS, Tableau.

Confidential

Big Data/Hadoop Developer

Responsibilities:

  • Worked as Hadoop Developer and responsible for taking care of everything related to teh clusters.
  • Developed Spark scripts by using Java, and Python shell commands as per teh requirement.
  • Involved with ingesting data received from various relational database providers, on HDFS for analysis and other big data operations.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Worked on Spark SQL and Data frames for faster execution of Hive queries using Spark SQL Context.
  • Performed analysis on implementing Spark using Scala.
  • Used Data frames/ Datasets to write SQL type queries using Spark SQL to work with datasets sitting on HDFS.
  • Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
  • Created and imported various collections, documents into MongoDB and performed various actions like query, project, aggregation, sort and limit.
  • Extensively experienced in deploying, managing and developing MongoDB clusters.
  • Created Hive tables to import large data sets from various relational databases using Sqoop and export teh analyzed data back for visualization and report generation by teh BI team.
  • Involved in creating Shell scripts to simplify teh execution of all other scripts (Pig, Hive, Sqoop, Impala and MapReduce) and move teh data inside and outside of HDFS.
  • Implemented some of teh big data operations on AWS cloud.
  • Used Hibernate reverse engineering tools to generate domain model classes, perform association mapping and inheritance mapping using annotations and XML.
  • Developed Pig Scripts, Pig UDFs and Hive Scripts, Hive UDFs to analyze HDFS data.
  • Maintained teh cluster securely using Kerberos and making teh cluster up and running all teh times.
  • Have an experience to load and transform large sets of structured, semi structured and unstructured data, using Sqoop from Hadoop Distributed File Systems to Relational Database Systems.
  • Created Hive tables to store teh processed results in a tabular format.
  • Used Hive QL to analyze teh partitioned and bucketed data and compute various metrics for reporting.
  • Performed data transformations by writing MapReduce as per business requirements.
  • Implemented schema extraction for Parquet and Avro file Formats in Hive.
  • Involved in various NoSQL databases like HBase, Cassandra in implementing and integration.
  • Queried and analyzed data from Cassandra for quick searching, sorting and grouping through CQL.
  • Responsible for developing data pipeline using flume, Sqoop and pig to extract teh data from weblogs and store in HDFS.

Environment: Java, Spark, Python, HDFS, YARN, Hive, Scala, SQL, MongoDB, Sqoop, AWS, Pig, MapReduce, Cassandra, NoSQL

We'd love your feedback!