Sr. Hadoop Developer Resume
Boca Raton, FL
SUMMARY:
- Have 7+ years of programming experience with skills in analysis design, testing and deploying various software applications that includes 5 years of strong work experience in Hadoop Eco Systems and Big - Data Analytics.
- Expertise in Big Data technologies and Hadoop ecosystems: HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Yarn Architecture and MapReduce programming paradigm.
- Real time experience in Hadoop/Big Data related technology experience in Storage, Querying, Processing and analysis of data.
- Hands on experience in installing, configuring, and using Hadoop ecosystem components like Hadoop MapReduce, HDFS, HBase, Oozie, Hive, Sqoop, Pig, Zookeeper and Flume
- Experience in managing and reviewing Hadoop log files.
- Well versed in developing complex Map Reduce programs using Apache Hadoop for analyzing Big Data.
- Expertise in Database Design, Creation and Management of Schemas, writing Stored Procedures, Functions, SQL queries.
- Have worked on a multitenant large Scale distribution cluster with more than 600 data nodes.
- Have expertise in optimizing, replicating Peta Bytes of data and minimizing operational failures across BIG Data Platform.
- Worked on custom Pig Loaders and Storage classes to work with a variety of data formats such as JSON, Compressed CSV, ORC, AVRO etc and reading data from various sources like HBase and Hive.
- Development expertise of RDBMS like ORACLE, SQLSERVER, TERADATA, MS SQL&No-SQL databases like HBase and Cassandra.
- Implemented batch processing solution to certain unstructured and large volume of data by using Hadoop Map Reduce framework.
- Experience in processing the data that we receive through message router.
- Hands on experience in writing Ad-hoc Queries for moving data from HDFS to HIVE and analyzing the data using HIVE QL.
- Working experience in importing and exporting data using Sqoop from Relational Database Systems (RDBMS) to HDFS.
- Extending HIVE and PIG core functionality by using custom User Defined Function’s (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig.
- Experience in analyzing data using Hive QL, Pig Latin, Spark and custom MapReduce programs in Java.
- Developed Pig Latin scripts for data cleansing and Transformation.
- Good Knowledge on Python Scripting.
- Expertise in writing complex shell scripts to process the files before loading into hdfs.
- Experience in implementing Spark using Pythonand Spark SQL for faster analyzing and processing of data.
- Proficiency with the application servers like WebSphere, WebLogic, JBOSS and Tomcat.
- Developed core modules in large cross-platform applications using JAVA, J2EE, Spring, Struts, Hibernate, JAX-WSWeb Services, and JMS.
- Experienced in developing Map Reduce programs using Apache Hadoop for working with Big Data.
- Developed Sqoop Scripts to extract data from DB2 EDW source databases onto HDFS.
- Experience in various data transformation and analysis tools like Map Reduce, Pig and Hive to handle files in multiple formats (JSON, Text, XML, Binary, Logs etc)
- Extensive Experience on importing and exporting data using stream processing platforms like Flume and Kafka.
- Worked with Oracle and Teradata for data import/export operations from different data marts.
- Configured internode communication between Cassandra nodes and client using SSL encryption.
- Expertise in NoSQL databases including HBase.
- Developed Spark Application by using Python (Pyspark)
- Expertise in importing and exporting the data using Sqoop from HDFS to Relational Database systems/mainframe and vice-versa.
- Experience in Data Analysis, Data Cleansing (Scrubbing), Data Validation and Verification, Data Conversion, Data Migrations and Data Mining.
TECHNICAL SKILLS:
Languages: C, C++, Java/J2EE, Python, SQL, HiveQL, PIG Latin,scala
Hadoop Ecosystem: HDFS, MapReduce, MRUnit, YARN, Hive, Pig, HBase, Impala, Zookeeper, Sqoop, Oozie, Apache Cassandra Flume, Spark, Storm and Avro, AWS
Web Technologies: Servlets, JSP, J2EE, JDK, JDBC
Framework: Core Spring, Spring DAO, Spring MVC, Hibernate
Web/Application Servers: Jetty, Apache Tomcat
Scripting Languages: JavaScript, jQuery, AJAX, JSTL, CSS
Markup Languages: HTML, XML
XML: DOM, SAX, DTD, XSD, SOAP, REST, JAXB, XSL, XSLT
Databases: Oracle, MySQL, MS SQL Server 2005, Derby, MS Access
OS: MS-Windows 95/98/NT/2000/XP/7, Linux, Unix, Solaris 5.1
Methodologies: OOP, Agile, Scrum, Extreme Programming
Version Control Tools: SVN, CVS, Git: Tools Eclipse Maven, ANT, JUnit, TestNG, Jenkins, Soap UI, Putty, Log4j, Bugzilla
ETL Tools: Ab Initio GDE 1.14/1.15/3.0.4 Co-Operating System 2.14, 2.15, 3.16
PROFESSIONAL EXPERIENCE:
Confidential, Boca Raton, FL
Sr. Hadoop Developer
Responsibilities:
- Installed and configured Hadoop MapReduce, HDFS, developed multiple MapReduce jobs in Java for data cleaning and pre-processing.
- Involved in data extraction from distributed RDBMS like Teradata and Oracle.
- Working extensively on HIVE, SQOOP, MAPREDUCE, SHELL, PIG, and PYTHON.
- Using SQOOP to move the structured data from MySQL to HDFS, HIVE, PIG and HBase.
- Involved in loading data from UNIX file system to HDFS.
- Wrote MapReduce jobs to discover trends in data usage by users.
- Using Oozie to schedule the workflows to perform shell action and hive actions.
- Used CDH3 and CDH4 distributions for development and deployment.
- Implemented Partitioning, Dynamic Partitioning, and Bucketing in HIVE.
- Develop HiveQL scripts to perform the incremental loads.
- Exported the result set from HIVE to MySQL using Shell scripts.
- Using PIG predefined functions to convert the fixed width file to delimited file.
- Installed and configured Pig for ETL jobs.
- Used agent E2E Chain for reliability and failover in flume.
- Experience in using Flume to stream data into HDFS - from various sources
- Using HIVE join queries to join multiple tables of a source system and load them into Elastic Search Tables.
- Create and modify Interactive Dashboards and Creating guided navigation links within Interactive Dashboards using Tableau.
- Utilized advance features of Tableau software like to link data from different connections together on one dashboard and to filter data in multiple views at once.
- Extensively used analytical features in Tableau like Statistical functions and calculations, trend lines, forecasting etc. in many reports.
- Analyzed the SQL scripts and designed the solution to implement using Scala.
- Involved in data migration from one cluster to another cluster.
- Experience in writing aggression logics in different combinations to perform complex data analytics to the business needs.
Environment: Hadoop, Map Reduce, Hive, Pig, Sqoop, HBase (NoSQL database), Tableau, Java 1.6, CentOS, and UNIX Shell Scripting.
Confidential, Santa Clara, CA
Hadoop Developer/DA
Responsibilities:
- Involved in analysis, development, and implementation of Datalake application and various reusable components by using Hadoop and its eco Systems.
- Involved in complete Implementation lifecycle, specialized in writing custom MapReduce, Pig and Hive programs.
- Created custom scripts to identify, store, and manage the Datasets in Hadoop Cluster.
- Extensively used Sqoop to get data from RDBMS.
- Developed customized UDF's in java to extend Hive and Pig Latin functionality.
- Tested Hadoop components on sample datasets in local pseudo distribution mode.
- Creating separate flows for each dataset to load data into Datalake .
- Involved in data cleansing and preprocessing of source files before loading into hadoop cluster.
- Developed hive queries to calculate the business logics and to invoke various services.
- Used File System check (FSCK) to check the health of files in HDFS.
- Worked on ingesting different file formats like Json files, XML files and text files usingIngestion framework.
- Discussions with other technical teams on regular basis regarding upgrades, Process changes, any Special processing and feedback
- Handling importing of data from various data sources, performed transformations using Hive, MapReduce, Spark and loaded data into HDFS.
- Created complex and efficient hive tables that are used in datalake.
- Created Data page rules that store data temporarily on client side to improve the performance.
- Implementing TWS scheduler that automate daily, weekly and monthly jobs in different server nodes.
- Analyze data and implement various techniques to make it clean and non-redundant.
- Prepared data to insert into lower environment databases to validate scenarios
- Developed complex shell scripts to process the data for different sources before ingestion.
- Experience in encrypting and masking the SPI data that will store in hadoop clusters using voltage transformations.
- Publishing the data to the downstreams for analysis purpose.
- Worked on Automation of the jobs using IBM Tivoli.
- Subscribed the data from different sources through data router,message router and other sub systems.
- Participate in scrum meeting every day to understand requirements and develops plans and allocate time for development. Also work with business to fine tune models and concepts to develop data models.
- Worked on Spark using python that will generate data from different datasets and publish to the end clients.
- Design Application rules which relates the application objectives, specifications, and actors to objects created as part of the Ingestion Framework.
- Participate in design meetings to create a plan for Unit and Integration tests.
- Create regression tests cases using business scenarios using specific flow.
- Provide the technical expertise to offshore/onshore team members and help them in all technical issues.
- Update End User Manual for end-user, operators, administrators and Support users.
- Analyzed the logs to identify the unhung thread, server related issues and application slowness.
- Tracing the application, checking the clipboard size and using the profiler while launching the application for analysis.
- Reducing the barriers between developers, QA, scrum master and clients with frequent communication regarding progress of development and defect fixes.
- Responsible for error free deployments in the higher environments where users can continue working on applications without interruptions.
Environment: Hadoop, Map Reduce, Hive, Pig, Sqoop, HBase (NoSQL database), Tableau, Java 1.6, CentOS, and UNIX Shell Scripting.
Confidential, IL
HADOOP DEVELOPER
RESPONSIBILITIES:
- Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Experienced in defining job flows.
- Experienced in managing and reviewing Hadoop log files.
- Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
- Load and transform large sets of structured, semi structured and unstructured data.
- Responsible to manage data coming from different sources.
- Involved in various NOSQL databases like HBase, Cassandra in implementing and integration.
- Queried and analyzed data from Cassandra for quick searching, sorting and grouping through CQL.
- Supported Map Reduce Programs those are running on the cluster.
- Involved in loading data from UNIX file system to HDFS.
- Installed and configured Hive and written Hive UDFs.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
- Implemented Spark advanced procedures like text analytics and processing using the in-memory computing capabilities.
- Experience in development related CRUD operations
- Experience in Schema defining
- Writing Pig Latin scripts to process the data
- Application performance optimization for Cassandra cluster.
- Written Hive queries for data analysis to meet the Business requirements.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop. Cluster co-ordination through Zookeeper.
Environment: Hadoop, MapReduce, HDFS, Hive, Java, Hadoop distribution of Cloudera, MapR, DataStax, Spark, SQL, Tableau, PIG, Zookeeper, Sqoop, Flume, Teradata, CentOS, Servlets, JDBC, JSP, JSTL, JPA, JavaScript, Eclipse, CVS, CSS, Xml, Json.
Confidential, OH
HADOOP DEVELOPER
RESPONSIBILITIES:
- Worked in the BI team in Big Data Hadoop cluster implementation and data integration in developing large-scale system software.
- Assess existing and available data warehousing technologies and methods to ensure our Data warehouse/BI architecture meets the needs of the business unit and enterprise and allows for business growth.
- Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
- Capturing data from existing databases that provide SQL interfaces using Sqoop.
- Worked extensively with Sqoop for importing and exporting the data from HDFS to Relational Database systems/mainframe and vice-versa. Loading data into HDFS.
- Develop and maintains complex outbound notification applications that run on custom architectures, using diverse technologies including Core Java, J2EE, SOAP, XML, JMS, JBoss and Web Services.
- Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
- Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and PIG to pre-process the data.
- Provided design recommendations and thought leadership to sponsors/stakeholders that improved review processes and resolved technical problems.
- Managed and reviewed Hadoop log files.
- Tested raw data and executed performance scripts.
- Shared responsibility for administration of Hadoop, Hive and Pig.
- Developed Hive queries for the analysts.
ENVIRONMENT: Hadoop, MapReduce, HDFS, Hive, Java, Hadoop distribution of Hortonworks, MapR, DataStax, IBM DataStage, PL/SQL, SQL*PLUS, PIG, Hive, Sqoop, Oozie, Flume, Hbase, UNIX Shell Scripting.
Confidential
Java/J2EE Developer
Responsibilities:
- Involved in Analysis, Design, Development, Integration and Testing of application modules and followed AGILE/SCRUM methodology. Participated in Estimation size of Backlog Items, Daily Scrum and Translation of backlog items into engineering design and logical units of work(tasks).
- Involved in developing a custom framework like Struts Framework, with more features to meet the business needs.
- Performed requirement analysis, design, coding and implementation, team co-ordination, code review, testing, and installation.
- Developed server side utilities using J2EE technologies Servlets, JSP, Struts.
- Developed presentation layers using JSP custom tags and JavaScript.
- Implemented design patterns - Business Delegate, Singleton, Flow Controller, DAO and Value Object patterns.
- Developed Role Based Access Control to restrict the users to access specific modules based on their roles.
- Used Oracle as the back end application and used Hibernate Framework for O/R mapping.
- Deployed the application on WebSphere server using Eclipse as the IDE.
- Used Tomcat server 5.5 and configured it with Eclipse IDE.
- Performed extensive Unit Testing for the application.
Environment: J2EE Custom Frame Work, WebSphere 5.1, Tomcat 5.0, Oracle 9i, Hibernate3.0, SAS 9, Eclipse 3.2, JSP, Java Script, Servlets, XML, Eclipse plug-ins (JUnit, Tomcat).
Confidential
Jr Java Developer
Responsibilities:
- Involved in the analysis, design, implementation, and testing of the project.
- Implemented the presentation layer with HTML and JavaScript.
- Developed web components using JSP, Servlets and JDBC.
- Implemented database using SQL Server.
- Designed tables and indexes.
- Wrote complex SQL and stored procedures.
- Involved in fixing bugs and unit testing with test cases using JUnit.
- Developed user and technical documentation.
Environment: Java, JSP, Servlets, JDBC, HTML, JavaScript, MySQL, JUnit, Eclipse IDE.
