Sr. Hadoop Developer Resume
Atlanta, GA
SUMMARY
- Overall 8+ years of overall experience with strong emphasis on Design, Development, Implementation, Testing and Deployment of Software Applications.
- Over 5+ years of comprehensive IT experience in BigData and BigData Analytics, Hadoop, HDFS, MapReduce, YARN, Hadoop Ecosystem, Streamsets and Shell Scripting.
- Highly capable for processing large sets of Structured, Semi - structured and Unstructured datasets and supporting BigData applications.
- Hands on experience with Hadoop Ecosystem components like MapReduce (Processing), HDFS (Storage), YARN, Sqoop, Pig, Hive, HBase, Oozie, Zookeeper and Spark for data storage and analysis.
- Expertise in transferring data between a Hadoop ecosystem and structured data storage in a RDBMS such as MY SQL, Oracle, Teradata and DB2 using Sqoop.
- Experience in NoSQL databases like Mongo DB, HBase and Cassandra.
- Have excellent knowledge on Python Collections and Multi-Threading.
- Skilled experience in Python with proven expertise in using new tools and technical developments
- Experience in Apache Spark cluster and streams processing using Spark Streaming
- Worked on several python packages like Numpy, Scipy, pytables etc.
- Expertise in moving large amounts of log, streaming event data and Transactional data using Flume.
- Experience in developing MapReduce jobs in Java for data cleaning and preprocessing.
- Expertise in writing Pig Latin, Hive Scripts and extended their functionality using User Defined Functions (UDF's).
- Expertise in handling structured arrangement of data within certain limits (Data Layout's) using Partitions and Bucketing in Hive.
- Developed use cases and PoC for various clients using Apache Spark/Pyspark, as a next generation, Big Data and Fast Data platforms.
- Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data.
- Expertise in preparing interactive Data Visualization's using Tableau Softwarefrom different sources.
- Hands on experience in developing workflows that execute MapReduce, Sqoop, Pig, Hive and Shell scripts using Oozie.
- Experience working with Cloudera Hue Interface and Impala.
- Experience using Hadoop Ecosystem tools including Pig, Hive, HBase, Sqoop, Flume, Kafka, Oozie, Zookeeper, Spark, Scala and Storm.
- Hands on experience developing Solr Indexes using MapReduce Indexer Tool.
- Expertise in Object-oriented analysis and design (OOAD) like UML and use of various design patterns.
- Experience in Java, JSP, Servlets, EJB, Web Logic, Web Sphere, Hibernate, Spring JBoss, JDBC, RMI, Java Script, Ajax, JQuery, XML and HTML.
- Fluent with the core Java concepts like I/O, Multi-threading, Exceptions, RegEx, Data Structures and Serialization.
- Performed unit testing using Junit Testing Framework and Log4J to monitor the error logs.
- Good Knowledge of Python and Python Web Framework Django.
- Experienced with Python frameworks like Webapp2 and, Flask.
- Experience in process improvement, normalization/de-normalization, data extraction, cleansing and manipulation.
- Converting requirement specification, Source system understanding into Conceptual, Logical and physical Data Model, Data flow (DFD).
- Expertise in working with transactional databases like Oracle, SQL server, My SQL, and Db2.
- Expertise in developing SQL queries, Stored Procedures and excellent development experience with Agile Methodology.
- Ability to adapt to evolving technology, strong sense of responsibility and accomplishment.
- Excellent leadership, interpersonal, problem solving and time management skills.
- Excellent communication skills both Written (documentation) and Verbal (presentation).
TECHNICAL SKILLS
Languages: SQL, C, C++, Java, J2EE, Pig Latin, Hive.Scala
Big Data Technologies: Hadoop, HDFS, MapReduce, Hive, Pig, HBase, Spark, Scala, Impala, Kafka, Hue, Sqoop, Oozie, Flume, Zookeeper, Cassandra, Cloudera CDH5, Python, PySpark, Solrand Horton works.
Databases: Oracle, MySQL, SQL Server, DB2, Mongo DB, Teradata, HBase, Cassandra.
Scripting and Query Languages: UNIX Shell scripting, SQL and PL/SQL.
Web Technologies: JSP, Servlets, JavaBeans, JDBC, AWT, Swing, JSF, XML, CSS, HTML, XHTML, JavaScript, AJAX.
Operating Systems: Windows 7&8, UNIX, Linux, CentOS, Ubuntu .Tools Eclipse, Tableau, squirrel, Talend, Toad, SQL Server Studio, GIT, SVN, Concurrent Versions System (CVS).
Reporting Tools: Crystal Reports, SQL Server Reporting Services and Data Reports, Business Intelligence and Reporting Tool (BIRT)
PROFESSIONAL EXPERIENCE
Confidential, Atlanta GA
Sr. Hadoop Developer
Responsibilities:
- Involved in moving legacy data from RDBMS, Mainframes, Teradata & External source systems data warehouse to Hadoop Data Lake and migrating the data processing to lake.
- Developed UNIX scripts using operators such as to extract data from data files to load into HDFS.
- Handled importing of data from various data sources, performed data control checks using Spark and loaded data into HDFS.
- Involved in converting Hive/SQL queries into Spark transformations using Spark Data Frames
- Developed Spark scripts by using Python shell commands as per the requirement.
- Involved in developing Balance Control Checks which includes record count validation, file naming pattern etc. to validate the data before ingesting into Data Lake
- Developed and implemented core API services using Scala and Spark.
- Develop Hive queries for the analysts by implementing performance tuning on huge data sets.
- Worked with different File Formats like AVROFILE, PARQUET TEXTFILE for HIVE querying and processing.
- Involved in creating Hive tables, and loading and analyzing data using Hive queries.
- Have knowledge on handling Hive Queries using Spark SQL that integrates Spark environment
- Used Avro format for storing data in RAWZ and Parquet for final repository in APPZ.
- Experience working with large volumes of complex data, in distributed frameworks such as Spark using python/ Scala processing in batches/streams.
- Developed custom UDF FUNCTIONS in Hive
- Developed and Implemented Daily Cron jobs that automate parallel tasks of loading the data into HDFS and pre-processing with CTRL-M jobs.
- Used Sqoop/TDCH Connector to import and export functionalities to handle large data set transfer between Teradata database and HDFS.
- Implemented StreamSets flow pipelines/topologies to perform cleansing operations before moving data into HDFS.
- Used Bit bucket as Code Repository and bamboo for code promotion
- Working with Apache StreamSets to perform the conversion of fixed width data into Delimited
- Involved in story-driven agile development methodology and actively participated in daily scrum meetings
Environment: Hive, Spark, Sqoop, ControlM, Hbase, Streamsets, CDH-5.12.0, Hue, Cloudera Manager
Confidential, Farmington Hills, MI
Sr. Hadoop Developer
Responsibilities:
- Responsible for understanding the requirements and implementing the security using AD Groups for the Dataset.
- Involved Low level design for MR, Hive, Impala, Shell scripts to process data.
- Worked on ETL scripts to pull the data from DB2/Oracle Data Base into HDFS.
- Experience in utilizing Spark machine learning techniques implemented in Scala.
- Involved in POC development and unit testing using Spark and Scala.
- Created Partitioned Hive tables and worked on them using Hive.
- Installing and configuring Hive, Sqoop, Flume, Oozie on the Hadoop clusters.
- Involved in scheduling Oozie workflow engine to run multiple Hive and Pig jobs.
- Develop and implement Python/Django applications.
- Developed a process for the Batch ingestion of CSV Files, Sqoop from different sources and also generating views on the data source using Shell Scripting and Python.
- Integrated a shell script to create Collections/morphline, SolrIndexes on top of table directories using MapReduce Indexer Tool within Batch Ingestion Framework.
- Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
- Implemented partitioning, dynamic partitions and buckets in HIVE.
- Developed Hive Scripts to create the views and apply transformation logic in the Target Database.
- Involved in the design of Data Mart and Data Lake to provide faster insight into the Data.
- Involved in using Stream Sets Data Collector tool and created Data Flows for one of the streaming application.
- Experienced in using Kafka as a data pipeline between JMS (Producer) and Spark Streaming Application (Consumer).
- Involved in the development of Spark Streaming application for one of the data source using Scala, Spark by applying the transformations.
- Analyzed the SQL scripts and designed the solution to implement using Pyspark.
- Skilled in using collections in Python for manipulating and looping through different user defined objects.
- Wrote a Python module to connect and view the status of an Apache Cassandra instance.
- Developed a script in Scala to read all the Parquet Tables in a Database and parse them as Json files, another script to parse them as structured tables in Hive.
- Designed and Maintained Oozie workflows to manage the flow of jobs in the cluster.
- Configured Zookeeper for Cluster co - ordination services.
- Generated Python Django forms to record data of online users and used PyTest for writing test cases
- Developed a unit test script to read a Parquet file for testing PySpark on the cluster.
- Involved in exploration of new technologies like AWS, Apache Flink, and Apache NIFIetc which can increase the business value.
Environment: Hadoop, HDFS, Hive, HBase, Zookeeper, Impala, Cloudera, Oracle, SQL Server, UNIX Shell Scripting, Flume, Scala, Spark, Sqoop, Python, Kafka, PySpark.
Confidential, Austin, TX
Hadoop Developer
Responsibilities:
- Responsible for gathering requirements from the business partners.
- Developed a process for Sqooping data from multiple sources like SQL Server, Oracle and Teradata.
- Responsible for creation of mapping document from source fields to destination fields mapping.
- Developed a shell script to create staging, landing tables with the same schema like the source and generate the properties which are used by Oozie jobs.
- Developed Oozie workflow s for executing Sqoop and Hive actions.
- Worked with NoSQL databases like Hbase in creating Hbase tables to load large sets of semi structured data coming from various sources.
- Involved in building database Model, APIs and Views utilizing python, in order to build an interactive web based solution
- Performance optimizations on Spark/Scala. Diagnose and resolve performance issues.
- Responsible for developing Python wrapper scripts which will extract specific date range using Sqoop by passing custom properties required for the workflow.
- Developed scripts to run Oozie workflows, capture the logs of all jobs that run on cluster and create a metadata table which specifies the execution times of each job.
- Developed Hive scripts for performing transformation logic and also loading the data from staging zone to final landing zone.
- Worked on Pyspark and Spark using Scala.
- Developed monitoring and notification tools using Python.
- Worked on Parquet File format to get a better storage and performance for publish tables.
- Involved in loading transactional data into HDFS using Flume for Fraud Analytics.
- Developed Python utility to validate HDFS tables with source tables.
- Designed and developed UDF S to extend the functionality in both PIG and HIVE.
- Import and Export of data using Sqoop between MySQL to HDFS on regular basis.
- Managed datasets using Panda data frames and MySQL, queried MYSQL database queries frompython using Python - MySQL connector and MySQL dB package to retrieve information.
- Developed and tested many features for dashboard using Python, Java, Bootstrap, CSS, JavaScript and JQuery.
- Responsible to check-in the developed code into Harvest for release management which is a part of CI/CD.
- Involved in using CA7 tool to setup dependencies at each level (Table Data, File and Time).
- Automated all the jobs for pulling data from FTP server to load data into Hive tables using Oozieworkflows.
- Involved in developing Spark code using Scala and Spark-SQL for faster testing and processing of data and exploring of optimizing it using Spark Context, Spark-SQL, Pair RDD's, Spark YARN.
- Migrating the needed data from Oracle, MySQL in to HDFS using Sqoop and importing various formats of flat files in to HDFS.
Environment: Hadoop, HDFS, Hive, HBase, Zookeeper, Oozie, Impala, Java(jdk1.6), Cloudera, Oracle, Teradata SQL Server, UNIX Shell Scripting, Flume, Scala, Spark, Sqoop, Python, Pyspark.
Confidential
Java / Hadoop Developer
Responsibilities:
- Responsible for understanding the scope of the project and requirements gathering
- Used MapReduce to Index the large amount of data to easily access specific records.
- Utilized Apache Hadoop ecosystem tools like HDFS, Hive and Pig for large datasets analysis.
- Worked with administrator to set up and monitor the Hadoop cluster.
- Developed MapReduce ETL in Java/Pig anddata validation using HIVE.
- Worked on Hive by creating external and internal tables, loading it with data and writing Hive queries.
- Supported MapReduce Programs which are running on the cluster.
- Involved in HDFS maintenance and loading of structured and unstructured data.
- Developed MapReduce programs to perform data filtering for unstructured data.
- Designed the application by implementing Struts Framework based on MVCArchitecture.
- Designed and developed the front end using JSP, HTML and JavaScript and JQuery.
- Developed framework for data processing using Design patterns, Java, XML.
- Implemented J2EE standards, MVC2 architecture using Struts Framework.
- Implementing Servlets, JSP and Ajax to design the user interface.
- Used JSP, Java Script, HTML5, and CSS for manipulating, validating, customizing, error messages to the User Interface.
- Used the light weight container of the Spring Framework to provide architectural flexibility for Inversion of Controller (IOC).
- Used SpringIOC for dependency injection to Hibernate and Spring Frameworks.
- Designed and developed Session beans to implement the Business logic.
- Developed EJB components that are deployed on Web logic Application Server.
- Written unit tests using Junit Framework and Logging is done using Log4J Framework.
- Used Html, CSS, JavaScript and JQuery to develop front end pages.
- Designed and developed various configuration files for Hibernate mappings.
- Designed and Developed SQL queries and Stored Procedures.
- Used XML, XSLT, XPATH to extract data from Web Services output XML
- Extensively used JavaScript, JQuery and AJAX for client - side validation.
- Used ANT scripts to fetch, build, and deploy application to development environment.
- Developed Web Services for sending and getting data from different applications using SOAP messages.
- Actively involved in code reviews and bug fixing.
- Applied CSS (Cascading style Sheets) for entire site for standardization of the site.
- Offshore co-ordination and User acceptance testing support.
Environment: Java 5.0, Struts, Spring 2.0, Hibernate 3.2, Web Logic 7.0, Eclipse 3.3, Oracle 10g, Junit4.2,Maven, Windows XP,J2EE, JSP, JDBC, Hibernate, spring, HTML, XMLCSS, JavaScript and JQuery.