Sr. Hadoop/spark Developer Resume
Itasca, IL
SUMMARY
- Highly acumen and experienced IT professional wif 8+ years of experience wif 4 yearsas Hadoop Developerin Big Data/Hadooptechnology development and 5 years as a Java developer.
- 4+ years of experience in HadoopEcosystem components like MapReduce, Sqoop, Flume, Kafka, Pig, Hive, Spark, Storm, HBase, Oozie, and Zookeeper.
- In depth understanding ofHadoopArchitecture including YARN and various components such asHDFS, Resource Manager, Node Manager, Name Node, Data Node and MR v1 & v2 concepts.
- Good noledge on creating Data Pipelines in SPARKusingSCALA.
- Good noledge on Spark components like Spark SQL, MLlib, Spark Streaming and GraphX.
- Hands - on experience on fetching teh live stream data from DB2 to HBase table usingSparkStreaming and Apache Kafka.
- Experience in developing Spark Programs for Batch and Real-Time Processing. Developed Spark Streaming applications for Real Time Processing.
- Strong noledge on implementation of data processing on Spark-Core using SPARK SQL, MLlib and Spark streaming.
- Expertise in writingSparkRDD transformations, actions, Data Frame's, case classes for teh required input data and performed teh data transformations usingSpark-Core.
- Experience in usingSpark-SQL wif various data sources like JSON, Parquet and Hive.
- Hands on experience in working onSpark-SQL queries, Data frames, import data from Data sources, perform transformations, perform read/write operations, save teh results to output directory into HDFS.
- Expertise in integrating teh data from multiple data sources using Kafka.
- Knowledge about unifying data platforms using Kafka producers/ consumers, implement pre-processing using storm topologies.
- Worked extensively wifHadoopDistributions like Cloudera, Hortonworks.Good noledge on MAPR distribution&Amazon’s EMR.
- Written and implemented custom UDF's in Pig for data filtering.
- Expertise in writing Hive and PIG queries for data analysis to meet teh business requirements.
- Hands-on experience in using Impala for data analysis.
- Hands-on experience in using teh data ingestion tools - Sqoop and flume.
- Experience in importing and exporting data using Sqoop from RDBMS to HDFS and vice-versa.
- Hands-on experience in configuring and working wif Flume to load teh data from multiple sources directly into HDFS.
- Worked on NoSQL databases like HBase, Cassandra and MongoDB.
- Good noledge in using job scheduling and monitoring tools like Oozie and Zookeeper.
- Experience in configuring teh Zookeeper to coordinate teh servers in clusters and to maintain teh data consistency.
- Experience in configuring various topologies in Storm to ingest and process data on teh fly from multiple sources and aggregate into central repository Hadoop.
- Hands on experience on build tools like Maven, Log4j, Junit and Ant.
- Experience in working wif Spring and Hibernates framework from Java.
- Extensive experience wif Databases such as Oracle, Mysql, MS-Sqland PL Sql Script.
- Experience in using IDEs like Eclipse, NetBeans and Intellij.
- Experience wif web UI development using jQuery, CSS, HTML, HTML5, XHTML, JavaScript.
- Working experience wif Linux lineup like Redhat and CentOS.
- Experience on ETL concepts using Informatica Power Center, OLAP and OLTP.
- Good Knowledge on AWS components like EC2 Instance, S3 and EMR.
- Comprehensive noledge of Software Development Life Cycle (SDLC).
- Exposure to Waterfall, Agileand Scrummodels.
- Strengths include handling variety of software systems, capacity to learn and adapt to new technologies, amicable team player and curriculum focused wif strong personal, technical and communication skills.
TECHNICAL SKILLS
Big Data Ecosystem: Hadoop, MapReduce, Pig, Hive, Kafka, Flume, Sqoop, YARN, Impala, Oozie, Zookeeper, Spark, MongoDB, Cassandra, Avro, Storm, Ambari, Solr, Mahout, Parquet and Snappy.
Hadoop Distributions: Cloudera (CDH3, CDH4, and CDH5), Hortonworks, MapR and Apache EMR.
Languages: Java, Python, SQL, HTML, Scala, JavaScript, XML and C/C++.
RDBMS: Oracle 10g/11g, MS SQL Server, MySQL, Teradata.
No SQL Databases: HBase, Cassandra, MongoDB, Neo4J.
Java Technologies: JavaBeans, JSP, JDBC, Servlets, JNDI, EJB and struts.
Development Methodologies: Agile, waterfall.
Web Design Tools: HTML, DHTML, AJAX, JavaScript, JQuery and CSS, AngularJs, ExtJS and JSON.
Development Tools: Eclipse, NetBeans, IntelliJ, Ant, Maven, JUNIT and log4J.
Cloud Platforms: AWS cloud, Google Cloud.
AWS Components: S3, EMR, EC2 and Cloud Watch.
App/Web servers: WebSphere, WebLogic, JBoss and Tomcat.
DB Languages: Oracle, MySQL, PL/SQL and PostgreSQL.
Operating systems: LINUX, UNIX, Mac OS and Windows OS.
ETL Tools: Informatica, Pentaho, Talend.
PROFESSIONAL EXPERIENCE
Sr. Hadoop/Spark Developer
Confidential, Itasca, IL
Responsibilities:
- Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase, Oozie, Sqoop,Kafka,Spark, Impala wif Cloudera distribution.
- Worked on Cloudera distribution and deployed on AWS EC2 Instances.
- Hands on experience on Cloudera Hue to import data on teh GUI.
- Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
- Performed Data Ingestion from multiple internal clients using Apache Kafka.
- Implemented real time system wif Kafka, Storm and Zookeeper.
- Worked on integrating Apache Kafka wif Spark Streaming process to consume data from external REST APIs and run custom functions.
- Developed and Configured Kafka brokers to pipeline server logs data into Spark streaming.
- Involved in performance tuning of Spark jobs using Cache and using complete advantage of cluster environment.
- Developed Spark scripts by using Scala Shell commands as per teh requirement.
- Configured, deployed and maintained multi-node Dev and Tested Kafka Clusters.
- Configured spark streaming data to receive real time data from Kafka and store it in HDFS.
- Processed Multiple Data sources input to same Reducer using Generic Writable and Multi Input format.
- Using Oozie as teh workflow engine, scheduled jobs. Developed wrapper shell scripts to hold teh Oozie workflow.
- Developed in scheduling Oozie workflow engine to run multiple Hive and Pig jobs.
- Involved in runningHadoopstreaming jobs to process terabytes of text data. Worked wif different file formats such as Text, Sequence files, Avro, ORC and Parquet.
- Configured, supported and maintained all network, firewall, storage, load balancers, operating systems, and software inAWSEC2.
- Worked on storing teh database in S3 by connecting Cassandra db to teh Amazon EMR File System.
- Implemented teh use of Amazon EMR for Big Data processing among a Hadoop Cluster of virtual servers on Amazon related EC2 and S3.
- Worked on custom Pig Loaders and storage classes to work wif variety of data formats such as JSON and XML file formats.
- Using Hive and Pig developed teh ad-hoc queries required for teh business users to generate data metrics.
- Developed PIG Latin scripts to extract teh data from teh web server output files and to load into HDFS.
- Used Hive to analyze teh partitioned and bucketed data and compute various metrics for reporting.
- Used IMPALA to analyze data ingested into Hive tables.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala and Python.
- Good Knowledge on MLLib framework for auto suggestions.
- Good noledge in using Data Manipulations, Tombstones, Compactions in Cassandra. Well experienced in avoiding faulty Writes and Reads in Cassandra.
- Performed data analysis wif Cassandra using Hive External tables.
- Involved in creating data-models for customer data usingCassandraQuery Language related to Cassandra clusters.
- Analyzed teh SQL scripts and designed teh solution to implement using PySpark.
- Experienced in runningHadoopstreaming jobs to process terabytes of xml format data.
- Used Spark API overHadoopYARN as execution engine for data analytics using Hive.
- ImplementedYARNCapacity Scheduler on various environments and tuned configurations according to teh application wise job loads.
- Implement ETL standards utilizing proven data processing patterns wif open source standard tools like Talend and Pentaho for more efficient processing.
- Developed Map Reduce ETL in Java/Pig and data validation using HIVE.
- Involved in loading data from LINUX filesystem to HDFS.
- Followed Agile Methodologies while working on teh project.
Environment: Hadoop, HDFS, Hive, Spark, Cloudera, AWS EC2, S3, ERM, Sqoop, Kafka, Spark MLLIB, PySpark, Yarn, Shell Scripting, Impala, Scala, Pig, Cassandra, Oozie,Java, JUnit, Agile methods, Linux, MySQL.
Hadoop Developer
Confidential, Flowood, MS
Responsibilities:
- Worked in Multi ClusteredHadoopEcho-System environment.
- Created MapReduce programs using Java API dat filter un-necessary records and find out unique records based on different criteria.
- Performed optimizing Map Reduce Programs using combiners, partitioners and custom counters for delivering teh best results.
- Converting teh existing relational database model toHadoopecosystem.
- Installed and configured Apache Hadoop, Hive and Pig environment.
- Worked wif Linux systems and RDBMS database on a regular basis so dat data can be ingested using Sqoop.
- Reviewed and managed all log files using HBase.
- Designed and implementedHIVE queries and functions for evaluation, filtering, loading and storing of data.
- Creating Hive tables and working on them using HiveQL.
- Developed data pipeline using Flumeand Spark to store data into HDFS.
- Big data processing usingSpark, AWS, and Redshift.
- Involved in teh process of data acquisition, data pre-processing and data exploration of telecommunication project inSpark.
- Involved in performing teh Linear Regression using Spark MLlib in Scala.
- Continuous monitoring and managing theHadoop cluster through HDP (Hortonworks Data Platform).
- Implemented Frameworks using Java and Python to automate teh ingestion flow.
- Loaded teh CDRs from relational DB using Sqoop and other sources toHadoop cluster by using Flume.
- Implemented data quality checks and transformations using Flume Interceptor.
- Implemented collections & Aggregate Frameworks in MongoDB.
- Experience in processing large volume of data and skills in parallel execution of process using Talend functionality.
- Involved in loading data from UNIX file system and FTP to HDFS.
- Design and Implementation of Batch jobs using MR2, PIG, Hive, Tez.
- Used Apache Tezfor highly optimized data processing.
- DevelopedHivequeries to analyze teh output data.
- Developed workflow in Oozie to automate teh tasks of loading teh data into HDFS.
- Developed Pig Custom UDF's for custom input formats for performing various levels of optimization.
- Involved in maintaining teh Hadoop clusters using Nagios server.
- Used Pig to import semi-structured data coming from Avro files to make serialization faster.
- Loading Data into HBase using Bulk Load and Non-bulk load.
- Used Spark for fast processing of data in Hive and HDFS.
- Performed batch processing of data sources using Apache Spark, Elastic search.
- UsedZookeeperto provide coordination services to teh cluster.
- CreatedHivequeries dat halped market analysts spot emerging trends by comparing freshdata wif reference tables and historical metrics.
- Wrote teh Shell scripts to monitor teh health check ofHadoop daemon services and respond accordingly to any warning or failure conditions.
- Worked on Reporting tools like Tableau to connect wif Hive for generating daily reports.
- Utilized Agile Scrum methodology.
Environment: Hadoop, HDFS, Pig, Hive, Map Reduce, Scala, Flume, Sqoop,Hortonworks,AWS, Redshift, Oozie, Zookeeper, Avro, Python, Shell Scripting, SQL Talend, Spark, HBase, MongoDB, Linux, Kafka.
Hadoop Developer
Confidential, Alexandria, VA
Responsibilities:
- Worked on importing data from various sources and performed transformations using Map Reduce and Hive to load data into HDFS.
- Developed multiple MapReduce jobs in PIG and HIVE for data cleaning and pre-processing.
- Hands on experience on HIVE queries and functions for evaluation, filtering, loading and storing of data.
- DevelopedHiveQLqueries, Mappings, tables, external tables in Hive for analysis across different banners and worked on partitioning, optimization, compilation and execution.
- Created HBase tables to store variable data formats coming from different portfolios Performed real time analytics on HBase using Java API and Rest API.
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
- Involved in pivot teh HDFS data from Rows to Columns and Columns to Rows.
- Moved all log/text files generated by various products into HDFS location.
- Experienced in managing and reviewing theHadooplog files.
- AnalyzedHBase data in Hive by creating external partitioned and bucketed tables.
- Written Map Reduce code dat will take input as log files and parse teh logs and structure them in tabular format to facilitate TEMPeffective querying on teh log data.
- In data exploration stage usedHiveand impala to get some insights about teh customer data.
- Experienced wif join different data sets using Pig join operations to perform queries using pig scripts.
- Used Oozie andZookeeperfor workflow scheduling and monitoring.
- Actively involved in loading data from UNIX file system to HDFS.
- Used Sqoopfor importing and exporting data into HDFS and HIVE.
- Used Flume to import teh Web Logics.
- Developed Shell scripts to automate routine DBA tasks.
- Development Review (code review) to ensure dat teh code functionality is as per business requirements and teh standards are followed.
- Implemented test scripts for supporting test driven development and continuous integration.
- Followed Agile Methodology.
Environment: HDFS, Map Reduce, Pig, Hive, Sqoop, Flume, HBase, Java, Maven, Cloudera, Eclipse, Agile, Unix and Shell Scripting.
Java Developer
Confidential
Responsibilities:
- Designed use cases for teh Application as per teh business requirements.
- Involved in various phases of Software Development Life Cycle (SDLC).
- Developed teh User Interfaces using Struts, JSP, JSTL, HTML and Ajax, JavaScript.
- Involved in creation of a queue manager in WebSphere MQ along wif teh necessary WebSphere MQ objects required for use wif WebSphere Data Interchange.
- Experience wif SOAP Web services and WSDL.
- Use ANT scripts to automate application build and deployment processes.
- Involved in design, development and Modification ofPL/SQLstored procedures, functions, packages and triggers to implement business rules into teh application.
- Used RESTful web services wif MVC for parsing and processing XML data.
- DevelopedETLprocesses to load data from Flat files, SQL Server and Access into teh target Oracle database by applying business logic on transformation mapping for inserting and updating records when loaded.
- Deployed web applications on Tomcat and JBoss server.
- Involved in creating User Authentication page usingJavaServlets.
- Migrated data source passwords to encrypted passwords using Vault tool in all theJBossapplication servers.
- Used Spring Framework for Dependency injection and integrated usingHibernate.
- UsedJMSfor asynchronous communication between different modules.
- Actively involved in code reviews and in bug fixing.
- Followed Agile software methodology for project development.
Environment: Java, J2EE, Servlets, HTML, XHTML, CSS, JavaScript, Struts 1.1, Spring, JSP, JMS, JBoss 4.0, Rest, SQL Server 2000, Ant, CVS, PL/SQL, MVC, Hibernate, Eclipse, Linux.
Java Developer
Confidential
Responsibilities:
- Good understanding in Install, configure and deploy software by gathering all teh requirements needed.
- Performed aQuality Assurance test.
- Help Design application development using Spring MVC framework, front-end interactive page design using HTML, JSP, JSTL, CSS, JavaScript, JQueryand AJAX.
- Implemented JavaScript, Shell script, JSP for Front and Server-side validations.
- Involved in writing SQL queries for fetching data from Oracle database.
- Developed multi-tiered web - application using J2EE standards.
- Used JIRA to track bugs.
- Used Apache Axis to develop web services and SOAP protocol for web services communication.
- Implemented persistence layer using Spring JDBC to store and update data in database.
- Used Apache Tomcatapplication serverfordeploying and configuring application.
- Used JUnit to test persistence and service tiers.Involved in unit test case preparation.
- Hands on experience in software configuration / change control process and tools like Subversion (SVN), Git CVS and Clear Case.
- Deployed and built teh application using MAVEN.
- Following AGILE and SCRUM Methodology.
- Involved in sprint planning, code review, and daily standup meetings to discuss teh progress of teh application.
Environment: HTML, Ajax, Servlets, JSP, SQL, JavaScript, CSS, XML, SOAP, Tomcat Server, Hibernate, JDBC, MAVEN, MVC, Agile, Git, JIRA, SVN.
