We provide IT Staff Augmentation Services!

Sr. Hadoop Developer Resume

4.00/5 (Submit Your Rating)

Atlanta, GA

SUMMARY

  • Overall 8+ years of overall experience with strong emphasis on Design, Development, Implementation, Testing and Deployment of Software Applications.
  • Over 5+ years of comprehensive IT experience in BigData and BigData Analytics, Hadoop, HDFS, MapReduce, YARN, Hadoop Ecosystem, Streamsets and Shell Scripting.
  • Highly capable for processing large sets of Structured, Semi - structured and Unstructured datasets and supporting BigData applications.
  • Hands on experience with Hadoop Ecosystem components like MapReduce (Processing), HDFS (Storage), YARN, Sqoop, Pig, Hive, HBase, Oozie, Zookeeper and Spark for data storage and analysis.
  • Expertise in transferring data between a Hadoop ecosystem and structured data storage in a RDBMS such as MY SQL, Oracle, Teradata and DB2 using Sqoop.
  • Experience in NoSQL databases like Mongo DB, HBase and Cassandra.
  • Have excellent knowledge on Python Collections and Multi-Threading.
  • Skilled experience in Python with proven expertise in using new tools and technical developments
  • Experience in Apache Spark cluster and streams processing using Spark Streaming
  • Worked on several python packages like Numpy, Scipy, pytables etc.
  • Expertise in moving large amounts of log, streaming event data and Transactional data using Flume.
  • Experience in developing MapReduce jobs in Java for data cleaning and preprocessing.
  • Expertise in writing Pig Latin, Hive Scripts and extended their functionality using User Defined Functions (UDF's).
  • Expertise in handling structured arrangement of data within certain limits (Data Layout's) using Partitions and Bucketing in Hive.
  • Developed use cases and PoC for various clients using Apache Spark/Pyspark, as a next generation, Big Data and Fast Data platforms.
  • Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data.
  • Expertise in preparing interactive Data Visualization's using Tableau Softwarefrom different sources.
  • Hands on experience in developing workflows that execute MapReduce, Sqoop, Pig, Hive and Shell scripts using Oozie.
  • Experience working with Cloudera Hue Interface and Impala.
  • Experience using Hadoop Ecosystem tools including Pig, Hive, HBase, Sqoop, Flume, Kafka, Oozie, Zookeeper, Spark, Scala and Storm.
  • Hands on experience developing Solr Indexes using MapReduce Indexer Tool.
  • Expertise in Object-oriented analysis and design (OOAD) like UML and use of various design patterns.
  • Experience in Java, JSP, Servlets, EJB, Web Logic, Web Sphere, Hibernate, Spring JBoss, JDBC, RMI, Java Script, Ajax, JQuery, XML and HTML.
  • Fluent with the core Java concepts like I/O, Multi-threading, Exceptions, RegEx, Data Structures and Serialization.
  • Performed unit testing using Junit Testing Framework and Log4J to monitor the error logs.
  • Good Knowledge of Python and Python Web Framework Django.
  • Experienced with Python frameworks like Webapp2 and, Flask.
  • Experience in process improvement, normalization/de-normalization, data extraction, cleansing and manipulation.
  • Converting requirement specification, Source system understanding into Conceptual, Logical and physical Data Model, Data flow (DFD).
  • Expertise in working with transactional databases like Oracle, SQL server, My SQL, and Db2.
  • Expertise in developing SQL queries, Stored Procedures and excellent development experience with Agile Methodology.
  • Ability to adapt to evolving technology, strong sense of responsibility and accomplishment.
  • Excellent leadership, interpersonal, problem solving and time management skills.
  • Excellent communication skills both Written (documentation) and Verbal (presentation).

TECHNICAL SKILLS

Languages: SQL, C, C++, Java, J2EE, Pig Latin, Hive.Scala

Big Data Technologies: Hadoop, HDFS, MapReduce, Hive, Pig, HBase, Spark, Scala, Impala, Kafka, Hue, Sqoop, Oozie, Flume, Zookeeper, Cassandra, Cloudera CDH5, Python, PySpark, Solrand Horton works.

Databases: Oracle, MySQL, SQL Server, DB2, Mongo DB, Teradata, HBase, Cassandra.

Scripting and Query Languages: UNIX Shell scripting, SQL and PL/SQL.

Web Technologies: JSP, Servlets, JavaBeans, JDBC, AWT, Swing, JSF, XML, CSS, HTML, XHTML, JavaScript, AJAX.

Operating Systems: Windows 7&8, UNIX, Linux, CentOS, Ubuntu .Tools Eclipse, Tableau, squirrel, Talend, Toad, SQL Server Studio, GIT, SVN, Concurrent Versions System (CVS).

Reporting Tools: Crystal Reports, SQL Server Reporting Services and Data Reports, Business Intelligence and Reporting Tool (BIRT)

PROFESSIONAL EXPERIENCE

Confidential, Atlanta GA

Sr. Hadoop Developer

Responsibilities:

  • Involved in moving legacy data from RDBMS, Mainframes, Teradata & External source systems data warehouse to Hadoop Data Lake and migrating the data processing to lake.
  • Developed UNIX scripts using operators such as to extract data from data files to load into HDFS.
  • Handled importing of data from various data sources, performed data control checks using Spark and loaded data into HDFS.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark Data Frames
  • Developed Spark scripts by using Python shell commands as per the requirement.
  • Involved in developing Balance Control Checks which includes record count validation, file naming pattern etc. to validate the data before ingesting into Data Lake
  • Developed and implemented core API services using Scala and Spark.
  • Develop Hive queries for the analysts by implementing performance tuning on huge data sets.
  • Worked with different File Formats like AVROFILE, PARQUET TEXTFILE for HIVE querying and processing.
  • Involved in creating Hive tables, and loading and analyzing data using Hive queries.
  • Have knowledge on handling Hive Queries using Spark SQL that integrates Spark environment
  • Used Avro format for storing data in RAWZ and Parquet for final repository in APPZ.
  • Experience working with large volumes of complex data, in distributed frameworks such as Spark using python/ Scala processing in batches/streams.
  • Developed custom UDF FUNCTIONS in Hive
  • Developed and Implemented Daily Cron jobs that automate parallel tasks of loading the data into HDFS and pre-processing with CTRL-M jobs.
  • Used Sqoop/TDCH Connector to import and export functionalities to handle large data set transfer between Teradata database and HDFS.
  • Implemented StreamSets flow pipelines/topologies to perform cleansing operations before moving data into HDFS.
  • Used Bit bucket as Code Repository and bamboo for code promotion
  • Working with Apache StreamSets to perform the conversion of fixed width data into Delimited
  • Involved in story-driven agile development methodology and actively participated in daily scrum meetings

Environment: Hive, Spark, Sqoop, ControlM, Hbase, Streamsets, CDH-5.12.0, Hue, Cloudera Manager

Confidential, Farmington Hills, MI

Sr. Hadoop Developer

Responsibilities:

  • Responsible for understanding the requirements and implementing the security using AD Groups for the Dataset.
  • Involved Low level design for MR, Hive, Impala, Shell scripts to process data.
  • Worked on ETL scripts to pull the data from DB2/Oracle Data Base into HDFS.
  • Experience in utilizing Spark machine learning techniques implemented in Scala.
  • Involved in POC development and unit testing using Spark and Scala.
  • Created Partitioned Hive tables and worked on them using Hive.
  • Installing and configuring Hive, Sqoop, Flume, Oozie on the Hadoop clusters.
  • Involved in scheduling Oozie workflow engine to run multiple Hive and Pig jobs.
  • Develop and implement Python/Django applications.
  • Developed a process for the Batch ingestion of CSV Files, Sqoop from different sources and also generating views on the data source using Shell Scripting and Python.
  • Integrated a shell script to create Collections/morphline, SolrIndexes on top of table directories using MapReduce Indexer Tool within Batch Ingestion Framework.
  • Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
  • Implemented partitioning, dynamic partitions and buckets in HIVE.
  • Developed Hive Scripts to create the views and apply transformation logic in the Target Database.
  • Involved in the design of Data Mart and Data Lake to provide faster insight into the Data.
  • Involved in using Stream Sets Data Collector tool and created Data Flows for one of the streaming application.
  • Experienced in using Kafka as a data pipeline between JMS (Producer) and Spark Streaming Application (Consumer).
  • Involved in the development of Spark Streaming application for one of the data source using Scala, Spark by applying the transformations.
  • Analyzed the SQL scripts and designed the solution to implement using Pyspark.
  • Skilled in using collections in Python for manipulating and looping through different user defined objects.
  • Wrote a Python module to connect and view the status of an Apache Cassandra instance.
  • Developed a script in Scala to read all the Parquet Tables in a Database and parse them as Json files, another script to parse them as structured tables in Hive.
  • Designed and Maintained Oozie workflows to manage the flow of jobs in the cluster.
  • Configured Zookeeper for Cluster co - ordination services.
  • Generated Python Django forms to record data of online users and used PyTest for writing test cases
  • Developed a unit test script to read a Parquet file for testing PySpark on the cluster.
  • Involved in exploration of new technologies like AWS, Apache Flink, and Apache NIFIetc which can increase the business value.

Environment: Hadoop, HDFS, Hive, HBase, Zookeeper, Impala, Cloudera, Oracle, SQL Server, UNIX Shell Scripting, Flume, Scala, Spark, Sqoop, Python, Kafka, PySpark.

Confidential, Austin, TX

Hadoop Developer

Responsibilities:

  • Responsible for gathering requirements from the business partners.
  • Developed a process for Sqooping data from multiple sources like SQL Server, Oracle and Teradata.
  • Responsible for creation of mapping document from source fields to destination fields mapping.
  • Developed a shell script to create staging, landing tables with the same schema like the source and generate the properties which are used by Oozie jobs.
  • Developed Oozie workflow s for executing Sqoop and Hive actions.
  • Worked with NoSQL databases like Hbase in creating Hbase tables to load large sets of semi structured data coming from various sources.
  • Involved in building database Model, APIs and Views utilizing python, in order to build an interactive web based solution
  • Performance optimizations on Spark/Scala. Diagnose and resolve performance issues.
  • Responsible for developing Python wrapper scripts which will extract specific date range using Sqoop by passing custom properties required for the workflow.
  • Developed scripts to run Oozie workflows, capture the logs of all jobs that run on cluster and create a metadata table which specifies the execution times of each job.
  • Developed Hive scripts for performing transformation logic and also loading the data from staging zone to final landing zone.
  • Worked on Pyspark and Spark using Scala.
  • Developed monitoring and notification tools using Python.
  • Worked on Parquet File format to get a better storage and performance for publish tables.
  • Involved in loading transactional data into HDFS using Flume for Fraud Analytics.
  • Developed Python utility to validate HDFS tables with source tables.
  • Designed and developed UDF S to extend the functionality in both PIG and HIVE.
  • Import and Export of data using Sqoop between MySQL to HDFS on regular basis.
  • Managed datasets using Panda data frames and MySQL, queried MYSQL database queries frompython using Python - MySQL connector and MySQL dB package to retrieve information.
  • Developed and tested many features for dashboard using Python, Java, Bootstrap, CSS, JavaScript and JQuery.
  • Responsible to check-in the developed code into Harvest for release management which is a part of CI/CD.
  • Involved in using CA7 tool to setup dependencies at each level (Table Data, File and Time).
  • Automated all the jobs for pulling data from FTP server to load data into Hive tables using Oozieworkflows.
  • Involved in developing Spark code using Scala and Spark-SQL for faster testing and processing of data and exploring of optimizing it using Spark Context, Spark-SQL, Pair RDD's, Spark YARN.
  • Migrating the needed data from Oracle, MySQL in to HDFS using Sqoop and importing various formats of flat files in to HDFS.

Environment: Hadoop, HDFS, Hive, HBase, Zookeeper, Oozie, Impala, Java(jdk1.6), Cloudera, Oracle, Teradata SQL Server, UNIX Shell Scripting, Flume, Scala, Spark, Sqoop, Python, Pyspark.

Confidential

Java / Hadoop Developer

Responsibilities:

  • Responsible for understanding the scope of the project and requirements gathering
  • Used MapReduce to Index the large amount of data to easily access specific records.
  • Utilized Apache Hadoop ecosystem tools like HDFS, Hive and Pig for large datasets analysis.
  • Worked with administrator to set up and monitor the Hadoop cluster.
  • Developed MapReduce ETL in Java/Pig anddata validation using HIVE.
  • Worked on Hive by creating external and internal tables, loading it with data and writing Hive queries.
  • Supported MapReduce Programs which are running on the cluster.
  • Involved in HDFS maintenance and loading of structured and unstructured data.
  • Developed MapReduce programs to perform data filtering for unstructured data.
  • Designed the application by implementing Struts Framework based on MVCArchitecture.
  • Designed and developed the front end using JSP, HTML and JavaScript and JQuery.
  • Developed framework for data processing using Design patterns, Java, XML.
  • Implemented J2EE standards, MVC2 architecture using Struts Framework.
  • Implementing Servlets, JSP and Ajax to design the user interface.
  • Used JSP, Java Script, HTML5, and CSS for manipulating, validating, customizing, error messages to the User Interface.
  • Used the light weight container of the Spring Framework to provide architectural flexibility for Inversion of Controller (IOC).
  • Used SpringIOC for dependency injection to Hibernate and Spring Frameworks.
  • Designed and developed Session beans to implement the Business logic.
  • Developed EJB components that are deployed on Web logic Application Server.
  • Written unit tests using Junit Framework and Logging is done using Log4J Framework.
  • Used Html, CSS, JavaScript and JQuery to develop front end pages.
  • Designed and developed various configuration files for Hibernate mappings.
  • Designed and Developed SQL queries and Stored Procedures.
  • Used XML, XSLT, XPATH to extract data from Web Services output XML
  • Extensively used JavaScript, JQuery and AJAX for client - side validation.
  • Used ANT scripts to fetch, build, and deploy application to development environment.
  • Developed Web Services for sending and getting data from different applications using SOAP messages.
  • Actively involved in code reviews and bug fixing.
  • Applied CSS (Cascading style Sheets) for entire site for standardization of the site.
  • Offshore co-ordination and User acceptance testing support.

Environment: Java 5.0, Struts, Spring 2.0, Hibernate 3.2, Web Logic 7.0, Eclipse 3.3, Oracle 10g, Junit4.2,Maven, Windows XP,J2EE, JSP, JDBC, Hibernate, spring, HTML, XMLCSS, JavaScript and JQuery.

We'd love your feedback!