We provide IT Staff Augmentation Services!

Hadoop/spark Developer Resume

5.00/5 (Submit Your Rating)

Sterling, VA

SUMMARY

  • Around 7+ years of IT experience in a variety of industries, which includes hands on experience in Big Data Hadoop and Java development
  • Expertise wif teh tools in Hadoop Ecosystem including Pig, Hive, HDFS, MapReduce, Sqoop, Storm, Spark, Kafka, Yarn, Oozie, and Zookeeper.
  • Excellent knowledge on Hadoop Ecosystems such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm
  • Experience in designing and developing applications in Spark using Scala to compare teh performance of Spark wif Hive and SQL/Oracle.
  • Strong experience in writing applications using python using different libraries like Pandas, NumPy, SciPy, Matpotlib etc.
  • Good Knowledge in Machine Learning algorithms using Python and its concepts as data - preprocessing, Regression, classification etc. and appropriate model selection techniques.
  • Good exposure wif Agile software development process.
  • Experience in manipulating/analyzing large datasets and finding patterns and insights wifin structured and unstructured data.
  • Strong experience on Hadoop distributions like Cloudera, MapR, Microsoft AzureHDINSIGHT and Horton Works.
  • Experience in implementing OLAP multi-dimensional cube functionality usingAzure SQL Data Warehouse
  • Good understanding of NoSQL databases and hands-on work experience in writing applications on NoSQL databases like HBase, Cassandra and MongoDB.
  • Experienced in writing complex MapReduce programs dat work wif different file formats like Text, Sequence, Xml, parquet, and Avro.
  • Experience in Oozie and workflow scheduler to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions wif control flows.
  • Experience in migrating teh data using Sqoop from HDFS to Relational Database System and vice-versa.
  • Extensive Experience on importing and exporting data using stream processing platforms like Flume and Kafka.
  • Good understanding of Teradata, Zeppelin and SOLR.
  • Exceptionally good experience in complete project life cycle (design, development, testing and implementation) of Client Server and Web applications.
  • Excellent Java development skills using J2EE, J2SE, Servlets, JSP, EJB, JDBC, SOAP and RESTful web services.
  • Strong Experience of Data Warehousing ETL concepts using Informatica Power Center, OLAP, OLTP and AutoSys.
  • Experience in database design using PL/SQL to write Stored Procedures, Functions, Triggers, and strong experience in writing complex queries for Oracle.
  • Experienced in working wif Amazon Web Services (AWS) using EC2 for computing and S3 as storage mechanism.
  • Strong experience in Object-Oriented Design, Analysis, Development, Testing and Maintenance.
  • Excellent implementation knowledge of Enterprise/Web/Client Server using Java, J2EE.
  • Experienced in using agile approaches, including Extreme Programming, Test-Driven Development and Agile Scrum.
  • Worked in large and small teams for systems requirement, design & development.
  • Key participant in all phases of software development life cycle wif Analysis, Design, Development, Integration, Implementation, Debugging, and Testing of Software Applications in client server environment, Object Oriented Experience in using various IDEs Eclipse, IntelliJ, and repositories SVN and Git.
  • Experience of using build tools Ant, Maven.

TECHNICAL SKILLS

Big Data Ecosystem: HDFS, Map Reduce, HIVE, PIG, HBase, Sqoop, Flume, Oozie, Spark, Storm, Kafka, HCatalog, Impala, Datameer.

Distributed Platforms: Cloudera, Hortonworks, MapR Azure HDINSIGHT and Apache

Languages: C, C++, Java, Scala, SQL, PL/SQL, Linux shell scripts, HL7.

NoSQL Databases: MongoDB, Cassandra, HBase

Java Technologies: Servlets, JavaBeans, JSP, JDBC

XML Technologies: XML, XSD, DTD, JAXP (SAX, DOM), JAXB

Methodology: Agile/Scrum, Rational Unified Process and Waterfall

Monitoring tools: Ganglia, Nagios.

Hadoop/BigData Technologies: HDFS, Map Reduce, spark sql, Sqoop, Flume, Pig, Hive, Oozie, impala, Zookeeper and Cloudera Manager, MongoDB, NO SQL Database HBase

Version Control: GitHub, Bitbucket, CVS, SVN, Clear Case, Visual Source Safe

Build & Deployment Tools: Maven, ANT, Hudson, Jenkins

Database: Oracle, MS SQL Server 2005, MySQL, Teradata

PROFESSIONAL EXPERIENCE

Confidential, Sterling, VA

Hadoop/Spark Developer

Responsibilities:

  • In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Application master, Name node, Master node, resource manager, data node and Map reduce concepts.
  • Managing fully distributed Hadoop cluster is an additional responsibility assigned to me. I was trained to overtake teh responsibilities of a Hadoop Administrator, which includes managing teh cluster, Upgrades and installation of tools dat uses Hadoop ecosystem.
  • Worked on Installation and configuring of Zookeeper to co-ordinate and monitor teh cluster resources.
  • Wrote AZUREPOWERSHELLscripts to copy or move data from local file system to HDFS storage.
  • Create data pipelines in cloud using Azure Data Factory.
  • Implemented test scripts to support test driven development and continuous integration.
  • Worked and learned a great deal of from Amazon web Services (AWS) EC2, S3, RDS, ELK.
  • Consumed teh data from Kafka using Apache spark.
  • Load and transform large sets of structured, semi structured, and unstructured data.
  • Involved in loading data from LINUX file system to HDFS
  • Importing and exporting data into HDFS and Hive using Sqoop
  • Worked in creating HBase tables to load large sets of semi structured data coming from various sources.
  • Extending HIVE and PIG core functionality by using custom User Defined Function’s (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig using python.
  • Experience in importing data from S3 to HIVE using Sqoop and Kafka.
  • Good Experience working wif Amazon AWS for accessing Hadoop cluster components.
  • Responsible for loading data files from various external sources like ORACLE, MySQL into Datalake.
  • Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet teh business requirements.
  • Actively involved in code review and bug fixing for improving teh performance.
  • Involved in development, building, testing, and deploy to Hadoop cluster in distributed mode.
  • Created Linux shell Scripts to automate teh daily ingestion of IVR data
  • Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet teh business requirements.
  • Automated teh History and Purge Process.
  • Monitoring production jobs using Control-M daily.
  • Created HBase tables to store various data formats of incoming data from different portfolios.
  • Created Pig Latin scripts to sort, group, join and filter teh enterprise wise data.
  • Developed teh verification and control process for daily load.
  • Experience in Daily production support to monitor and trouble shoots Hadoop/Hive jobs.

Environment: Hadoop, HDFS, Pig, Apache Hive, Sqoop, Kafka, Apache Spark, Scala, AWS, S3, Shell Scripting, Azure, HBase, Python, Kerberos, Agile, Zookeeper, Maven, Ambari, Horton Works, Control-M

Confidential, Hartford, CT

BigData Hadoop/Data Developer

Responsibilities:

  • Developing and maintaining a Data Lake containing regulatory data for federal reporting wif big data technologies such as Hadoop Distributed File System (HDFS), Apache Impala, Apache Hive and Cloudera distribution.
  • Developing different ETL jobs to extract data from different data sources like Oracle, Microsoft SQL Server, transform teh extracted data using Hive Query Language (HQL) and load it into Hadoop Distributed file system (HDFS).
  • Involved in importing teh data from different sources into HDFS using Sqoop and applying transformations using Hive, spark and then loading data into Hive tables.
  • Fixing data related issues wifin teh Data Lake.
  • Primarily involved in Data Migration process using Azure by integrating wif GitHub repository and Jenkins.
  • Designed, developed, and did maintenance of data integration programs in a Hadoop and RDBMS environment wif both traditional and non-traditional source systems as well as RDBMS and NoSQL data stores for data access and analysis.
  • Performed in-memory computing capacity of Spark to perform procedures such as text analysis and processing using Scala.
  • Primarily responsible for designing, implementing, Testing, and maintaining database solution for Azure.
  • Experience working wif Spark Streaming and divided data into different branches for batch processing through teh Spark engine.
  • Implementing new functionality in teh Data Lake using big data technologies such as Hadoop Distributed File System (HDFS), Apache Impala and Apache Hive based on teh requirements provided by teh client.
  • Communicating regularly wif teh business teams along wif teh project manager to ensure dat any gaps between teh client’s requirements and project’s technical requirements are resolved.
  • Developing Python scripts using Hadoop Distributed File System API’s to generate Curl commands to migrate data and to prepare different environments wifin teh project.
  • Monitoring production jobs using Control-M daily.
  • Coordinating teh Production releases wif teh change management team using Remedy tool.
  • Communicating effectively wif team members and conducting code reviews.

Environment: Hadoop, Data Lake, Azure, Python, Spark, Hive, Cassandra, ETL Informatica, Cloudera, Oracle 10g, Microsoft SQL Server, Control-M, Linux

Confidential, Cary, NC

BigData Hadoop/Spark Developer

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Migrated PIG scripts, MR to into Spark Data frames API and Spark SQL to improve performance.
  • Used Spark-Streaming APIs to perform transformations and actions on teh fly for building teh common learner data model which gets teh data from Kafka in near real time and Persists into Cassandra.
  • Expertise in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • Developed DF's, Case Classes for teh required input data and performed teh data transformations using Spark-Core.
  • Developed Scala scripts, UDFFs using both Data frames/SQL and RDD/MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop; And Developed application using Scala as well
  • Expertise in deployment of Hadoop Yarn, Spark and Storm integration wif Cassandra, ignite and Kafka etc.
  • Strong working experience on Cassandra for retrieving data from Cassandra clusters to run queries.
  • Developed POC using Scala and deployed on teh Yarn cluster, compared teh performance of Spark, wif Hive and SQL.
  • Deployed and maintained multi-node Dev and Test Kafka Clusters.
  • Developed Spark scripts by using Scala shell commands as per teh requirement.
  • Experience in using Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
  • Implemented ELK (Elastic Search, Log stash, Kibana) stack to collect and analyze teh logs produced by teh spark cluster.
  • Performed advanced procedures like text analytics and processing, using teh in-memory computing capabilities of Spark using Scala.
  • Developed an equivalent Spark Scala code for existing SAS code to extract summary insights on teh hive tables.
  • Responsible for importing teh data from different sources like MYSQL databases into HDFS to save it in form of AVRO, JSON file formats.
  • Experience in importing data from S3 to HIVE using Sqoop and Kafka.
  • Good Experience working wif Amazon AWS for accessing Hadoop cluster components.
  • Involved in creating partitioned Hive tables, and loading and analyzing data using hive queries, Implemented Partitioning and bucketing in Hive.
  • Worked on a POC to compare processing time of Impala wif Apache Hive for batch applications to implement teh former in project.
  • Developed Hive queries to process teh data and generate teh data cubes for visualizing
  • Good experience wif Talend open studio for designing ETL Jobs for Processing of data.
  • Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
  • Configured Hadoop clusters and coordinated wif BigData Admins for cluster maintenance.

Environment: Hadoop, YARN, Spark-Core, Spark-Streaming, Spark-SQL, Scala, Python, Kafka, Hive, Sqoop, Amazon AWS, Elastic Search, Impala, Cassandra, Tableau, Informatica, Cloudera, Oracle 10g, Linux.

Confidential

Hadoop Developer

Responsibilities:

  • Experience in developing customized UDF's in java to extend Hive and Pig Latin functionality.
  • Responsible for installing, configuring, supporting, and managing of Hadoop Clusters.
  • Importing and exporting data into HDFS from Oracle 10.2 database and vice versa using SQOOP.
  • Installed and configured Pig and written Pig Latin scripts.
  • Designed and implemented HIVE queries and functions for evaluation, filtering, loading and storing of data.
  • Created HBase tables and column families to store teh user event data.
  • Written automated HBase test cases for data quality checks using HBase command line tools.
  • Developed a data pipeline using HBase and Hive to ingest, transform and analyzing customer behavioral data.
  • Experience in collecting teh log data from different sources like (webservers and social media) using Flume and storing on HDFS to perform MapReduce jobs.
  • Handled importing of data from machine logs using Flume.
  • Worked extensively wif importing metadata into Hive and migrated existing tables and applications to work on Hive and AWS cloud.
  • Configured, monitored, and optimized Flume agent to capture web logs from teh VPN server to be put into Hadoop Data Lake.
  • Responsible for loading data from UNIX file systems to HDFS. Installed and configured Hive and written Pig/Hive UDFs.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala, and Python.
  • Exported teh analyzed data to teh relational databases using Sqoop to further visualize and generate reports for teh BI team.
  • Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis.
  • Wrote Java code to format XML documents; upload them to Solr server for indexing.
  • Used wif NoSQL technology (Amazon Dynodb) to gather and track event-based metric.
  • Maintenance of all teh services in Hadoop ecosystem using ZOOKEPER.
  • Designed and implemented Spark jobs to support distributed data processing.
  • Expertise in Extraction, Transformation, loading data from Oracle, DB2, SQL Server, MS Access, Excel, Flat Files and XML using Talend.
  • Experienced on loading and transforming of large sets of structured, semi and unstructured data.
  • Help design of scalable Big Data clusters and solutions.
  • Followed Agile methodology for teh entire project.
  • Involved in review of functional and non-functional requirements.
  • Involved in Hadoop cluster task like Adding and Removing Nodes wifout any effect to running jobs and data.
  • Developed workflows using Oozie to automate teh tasks of loading teh data into HDFS and pre-processing wif Pig.
  • Developed interactive shell scripts for scheduling various data cleansing and data loading process.
  • Converting teh existing relational database model to Hadoop ecosystem.

Environment: Hadoop, HDFS, pig, Hive, Flume, Sqoop, Oozie, Python, Shell Scripting, SQL Talend, Spark, HBase, Elastic search, Linux- Ubuntu, Kafka.

Confidential

Java/Hadoop Developer

Responsibilities:

  • Developed JSP, JSF and Servlets to dynamically generate HTML and display teh data to teh client side.
  • Used Hibernate Framework for persistence onto oracle database.
  • Written and debugged teh ANT Scripts for building teh entire web application.
  • Developed web services in Java and Experienced wif SOAP, WSDL and used WSDL to publish teh services to another application.
  • Implemented Java Message Services (JMS) using JMS API.
  • Involved in managing and reviewing Hadoop log files.
  • Installed and configured Hadoop, YARN, Map Reduce, Flume, HDFS, developed multiple Map Reduce jobs in Java for data cleaning.
  • Coded Hadoop Map Reduce jobs for energy generation and PS.
  • Coded using Servlets, SOAP Client and Apache CXF RestAPI's for delivering teh data from our application to external and internal for communication protocol.
  • Worked on Cloudera distribution system for running Hadoop jobs on it.
  • Expertise in writing Hadoop Jobs to analyze data using Map Reduce, Hive, Pig and Solr, Splunk.
  • Created SOAP Web Service using JAX-WS, to enabled client to consume a SOAP Web Service.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems (RDBMS) and vice-versa.
  • Experienced in designing and developing multi-tier scalable applications using Java and J2EE Design Patterns.

Environment: Java, HTML, Java Script, SQL Server, PL/SQL, JSP, Spring, Hibernate, Web Services, SOAP, SOA, JSF, Java, JMS, Junit, Oracle, Eclipse, SVN, XML, CSS, Log4j, Ant, Apache Tomcat.

Confidential

Java/J2EE developer

Responsibilities:

  • Designed and developed Struts like MVC 2 Web framework using teh front-controller design pattern which is used successfully in several production systems.
  • Spearheaded teh Quick Wins project by working very closely wif teh business and end users to improve teh current website s ranking from being 23rd to 6th in just 3 months.
  • Normalized Oracle database conforming to design concepts and best practices.
  • Resolved product complications at customer sites and funneled teh insights to teh development and deployment teams to adopt long term product development strategy wif minimal roadblocks.
  • Convinced business users and analysts wif alternative solutions dat are more robust and simpler to implement from technical perspective while satisfying teh functional requirements from teh business perspective.
  • Applied design patterns and OO design concepts to improve teh existing Java/JEE based code base.
  • Identified and fixed transactional issues due to incorrect exception handling and concurrency issues due to unsynchronized block of code.

Environment: Java 1.2/1.3 Swing Applet Servlet JSP custom tags JNDI JDBC XML XSL DTD HTML CSS Java Script Oracle DB2 PL/SQL Weblogic JUnit Log4J and CVS

We'd love your feedback!