Hadoop/spark Developer Resume
Detroit, MI
SUMMARY
- Around 8 years of Professional experience in IT Industry, involved in Developing, Implementing, Configuring Hadoop ecosystem components on Linux environment, Development and maintenance of various applications using Java, J2EE, developing strategic methods for deploying Big data technologies to efficiently solve Big Data processing requirement.
- 4 years of experience as Hadoop Developer with sound knowledge in Hadoop ecosystem technologies.
- Hands on experience in Hadoop eco system components such as HDFS, MapReduce, Yarn, Pig, Hive, Hbase, Oozie, Zookeeper, Sqoop, Flume, Impala, Kafka and Strom.
- Excellent Programming skills at a higher level of abstraction using Scala andSpark.
- Good understanding in processing of real - time data usingSpark.
- Hands on experience in Importing and exporting data from different databases like MySQL, Oracle, Teradata into HDFS using Sqoop
- Strong experience working with real time streaming applications and batch style large scale distributed computing applications using tools likeSparkStreaming, Kafka, Flume, MapReduce, Hive.
- Experience working with Cassandra and NoSQL database including MongoDB and Hbase.
- Managing and scheduling batch Jobs on a Hadoop Cluster using Oozie.
- Experience in managing and reviewing Hadoop Log files.
- Used Zookeeper to provide coordination services to the cluster.
- Experienced using Sqoop to import data into HDFS from RDBMS and vice-versa.
- Experience and understanding in Spark and Storm.
- Hands on dealing with log files to extract data and to copy into HDFS using flume.
- Experience in analyzing data using Hive, Pig Latin, and custom MR programs in Java.
- Experience in designing and coding web applications using Core Java & Web Technologies- JSP, Servlets and JDBC, full Understanding of utilizing J2EE technology Stack, including Java related frameworks like Spring, ORM Frameworks(Hibernate).
- Experience in designing the User Interfaces using HTML, CSS, JavaScript and JSP.
- Developed web application in open source java framework Spring. Utilized Spring MVC framework.
- Experienced front-end development using EXT-JS, JQuery, JavaScript, HTML, Ajax and CSS.
- Developed RESTful Web Services using Spring Rest and Jersey framework.
TECHNICAL SKILLS
Big Data: ApacheHadoop, HDFS, Map Reduce, Hive, PIG, OOZIE, SQOOP, Spark, Cloudera manager and EMR
Database: MYSQL, Oracle, SQL Server, Hbase
IDEs: Eclipse, NetBeans
Languages: C, Java, PIG LATIN, UNIX shell scripting, Python
Scripting Languages: HTML, CSS, JavaScript, DHTML, XML, JQuery
Web Technologies: HTML, XML, JavaScript, J query
Web/Application Servers: Apache Tomcat, WebLogic
PROFESSIONAL EXPERIENCE
Confidential, Detroit, MI
Hadoop/Spark Developer
Responsibilities:
- Worked on analyzing Hadoop cluster using different big data analytic tools including Flume, Pig, Hive, HBase, Oozie, Zookeper, Sqoop, Spark and Kafka.
- Developed Spark code using scala and Spark-SQL/Streaming for faster testing and processing of data.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark -SQL, Data Frame, Pair RDD's, Spark YARN.
- Experienced with batch processing of data sources using Apache Spark, Elastic search.
- Developed analytical components using Scala, Spark and Spark Stream.
- Experienced with NoSQL databases like HBase, MongoDB and Cassandra.
- Installed Hadoop, Map Reduce, HDFS and developed multiple MapReduce jobs in PIG and Hive for data cleaning and pre-processing.
- Developed Kafka producer and consumers, Spark and Hadoop MapReduce jobs.
- Import the data from different sources like HDFS/Hbase into Spark RDD.
- Involved in converting Map Reduce programs into Spark transformations using Spark RDD's on Scala.
- Developed Spark scripts by using Scala Shell commands as per the requirement.
- Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
- Load the data into Spark RDD and do in memory data Computation to generate the Output response.
- Loading Data into Hbase using Bulk Load and Non-bulk load.
- Experience in Oozie and workflow scheduler to manage hadoop jobs with control flows.
Environment: Hadoop, HDFS, Spark, MapReduce, Pig, Hive, Sqoop, Kafka, HBase, Oozie, Flume, Scala, Python, Java, SQL Scripting and Linux Shell Scripting, Cloudera, Cloudera Manager.
Confidential, Overland Park, KS
Hadoop developer
Responsibilities:
- Design and migration of existing RAN MSBI system toHadoop
- Designed the control tables / Job tables in Hbase and MySql. Created external Hive tables on Hbase.
- Experience in developing batch processing framework to ingest data into HDFS, Hive and Hbase.
- Worked on Hive and Pig extensively to analyze network data
- Automation of data pulls from SQL Server toHadoopeco system via SQOOP.
- Performance Tuning Hive and Pig Job's performance parameters along with native map - reduce parameters to avoid excessive disk spills, enabled temp file compression between jobs in the data pipeline to handle production size data in a multi-tenant cluster environment (Ambari Views / Analyze / Explain plan etc.,)
- Designed workflows and coordinators in Oozie to automate and parallelize Hive and Pig jobs on ApacheHadoopenvironment by Hortonworks (HDP 2.2)
- Hands on writing complex Hive queries involving external dynamic partitioned on date Hive Tables which stores rolling window time-period user viewing history
- Experience of performance tuning hive scripts, pig scripts, MR jobs in production environment by altering job parameters.
- Involved in data modeling.
- DeliveredHadoopmigration strategy, roadmap and technology fitment.
- Designed & implemented HBase tables, Hive UDFs the data with complete ownership.
- Automated many cross-technology tasks using shell scripting& defining the cron-tabs.
- Worked collaboratively with different teams to smoothly slide the project to production
- Built the process automation of various jobs using OOZIE.
- Experienced in working with Hortonworks and AWS on Real time issues and bringing them to closure
- Used Apache Kafka for importing real time network log data into HDFS
- POCs on moving existing Hive / Pig Latin jobs to Spark
- Deployed and configured Flume agents to stream log events into HDFS for analysis.
- Load the data into Hive tables using Hive HQL's along with deduplication and Windowing
- Generated ad-hoc reports using Hive to validate customer viewing history and debug issues in production
- Worked on HCatalog which allows PIG and Map Reduce to take advantage of the SerDE data format transformation definitions that we write for HIVE
- Worked on configuring Tableau to Hive data and also on using Spark as execution engine for Tableau instead of MapReduce
- Worked on different file formats (ORCFILE, TEXTFILE) and different Compression Codecs (GZIP, SNAPPY, LZO)
- Worked with multiple Input Formats such as Text File, Key Value, Sequence File input format.
- Installed and configured various components ofHadoopecosystem and maintained their integrity.
- Planning for production cluster hardware and software installation on production cluster and communicating with multiple teams to get it done.
- Designed, configured and managed the backup and disaster recovery for HDFS data.
- Migrated data across clusters using DISTCP.
- Experience in collecting metrics forHadoopclusters using Ambari.
- Worked with BI teams in generating the reports in Tableau
- Working with Java Development teams in the data parsing.
- Worked on loading source data to HDFS by writing java code.
- All small files will be merged and loaded into HDFS using java code and tracking history related to merge files are maintained in HBASE.
- Involved in developing Multi-Threading environment to improve the performance of merging operations.
- UsedHadoopjava API to develop the code.
- Involved in writing a java program to add or remove headers from the file.
Environment: HDFS, MapReduce, Spark, Pig, Hive, HBase, Pig, Flume, Sqoop and Flume
Confidential, San Francisco, CA
Hadoop Developer
Responsibilities:
- Experience in administration, installing, upgrading and managing CDH3, Pig, Hive&Hbase.
- Architecture and implementation of the Product Platform as well as all data transfer, storage and Processing from Data Center and to Hadoop File Systems
- Experienced in defining job flows.
- Implemented CDH3 Hadoopcluster on CentOS.
- Worked on installing cluster, commissioning & decommissioning of datanode, namenode recovery, capacity planning, and slots configuration.
- Wrote Custom Map Reduce Scripts for Data Processing in Java
- Importing and exporting data into HDFS and Hive using Sqoop.
- Responsible to manage data coming from different sources.
- Supported MapReduce Programs those are running on the cluster.
- Involved in loading data from UNIX file system to HDFS.
- Created Hive tables to store data into HDFS, loading data and writing hivequeries which will run internally in map reduce way.
- Used Flume to Channel data from different sources to HDFS
- Created HBase tables to store variable data formats of PII data coming from different portfolios
- Implemented best income logic using Pigscripts. Wrote custom PigUDF to analyze data
- Load and transform large sets of structured, semistructured and unstructureddata
- Cluster coordination services through Zookeeper
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
Environment: Hadoop1.0.0, MapReduce, Hive, HBase, Flume, Pig, Zookeeper, Java, ETL, SQL, CentOS
Confidential
Java Developer
Responsibilities:
- Involved in coding of JSP pages for the presentation of data on the View layer in MVC architecture
- Used J2EE design patterns like Factory Methods, MVC, and Singleton Pattern that made modules and code more organized, flexible and readable for future upgrades
- Worked with JavaScript to perform client side form validations.
- Used Struts tag libraries as well as Struts tile framework.
- Used JDBC to access Database with Oracle thin driver of Type - 3 for application optimization and efficiency.
- Client side validation done using JavaScript.
- Used Data Access Object to make application more flexible to future and legacy databases.
- Actively involved in tuning SQL queries for better performance.
- Worked with XML to store and read exception messages through DOM.
- Wrote generic functions to call Oracle stored procedures, triggers, functions.
Environment: JDK, J2EE, UML, Servlet, JSP, JDBC, Struts, XHTML, JavaScript, MVC, XML, XML, Schema, Tomcat, Eclipse.
Confidential
Java Developer
Responsibilities:
- Developed a web based application using three tier architecture.
- Used JSP for the GUI and Java Servlets to handle requests and responses.
- Implemented DAO connections to establish connections with database (Oracle 11g) to save and retrieve data.
- Performed Unit testing and security testing for the application.
- Created the Database, User, Environment, Activity, and Class diagram for the project (UML).
- Implement the Database using Oracle database engine.
- Designed and developed a fully functional generic n - tiered J2EE application platform the environment was Oracle technology driven. The entire infrastructure application was developed using Oracle JDeveloperin conjunction with Oracle ADF-BC and Oracle ADF- Rich Faces.
- Created an entity object (business rules and policy, validation logic, default value logic, security).
- Created View objects, View Links, Association Objects, Application modules with data validation rules (Exposing Linked Views in an Application Module), LOV, dropdown, value defaulting, transaction management features.
- Web application development using J2EE: JSP, Servlets, JDBC, Java Beans, Struts, Ajax, JSF, JSTL, Custom Tags, EJB, JNDI, Hibernate, ANT, JUnit and Apache Log4J, Web Services, Message Queue (MQ).
- Designing GUI prototype using ADF 11G GUI component before finalizing it for development.
- Create Reusable Component (ADF Library and ADF Task Flow).
- Experience using Version controls such as CVS, PVCS, and Rational Clear Case.
- Creating Modules Using Task Flow with Bounded and Unbounded.
- Generating WSDL (Web Services) And Create Work Flow Using BPEL.
- Handled the AJAX functions (partial trigger, partial Submit, auto Submit).