Hadoop/spark Developer Resume
Chicago, IL
SUMMARY
- 8+ years of practical experience in building industry specific Java applications and implementing Big - data technologies in core and enterprise software development.
- 4+ years of experience in developing applications that perform large scale Distributed Data Processing using Big Data ecosystem tools like Hadoop, Hive, Pig, Sqoop, Hbase, Cassandra, Spark, Spark Streaming, Spark SQL, Mahout, Oozie, ZooKeeper, Flume, Kafka and Yarn.
- Excellent understanding ofHadooparchitecture and various components such as HDFS, JobTracker, TaskTracker, NameNode, DataNode and MapReduce programming paradigm.
- Hands on experience in using various Hadoop distributions (Cloudera, Hortonworks, MapR).
- Deep knowledge on spark architecture and how RDD's work internally. Has exposure to Spark Streaming, Spark SQL, No SQL databases like Cassandra and Hbase.
- Experience in converting Hive/SQL queries into RDD transformations using Apache Spark, Scala.
- Hands on Experience in designing and developing applications in Spark using Scala to compare teh performance of Spark with Hive and SQL/Oracle.
- Experience in implementing Real-Time event processing and analytics using messaging systems like Spark Streaming.
- Implemented Kafka Custom encoders for custom input format to load data into Kafka Partitions. Real time streaming teh data using Spark with Kafka for faster processing.
- Exposure in analyzing data using HiveQL, Pig Latin, HBase and custom Map Reduce programs.
- Good working experience on using Sqoop to import data into HDFS from RDBMS and vice-versa.
- Expertise in job scheduling and monitoring tools like Oozie and ZooKeeper.
- Hands-on experience in provisioning and managing multi-tenant Cassandra cluster on public cloud environment - Amazon Web Services (AWS) - EC2, S3, Lambda, Route 53.
- Hands on performing ad-hoc queries on structured data using Hive QL and used Partition and Bucketing techniques and joins with HIVE for faster data access.
- Extensively worked on Hive, Pig and Sqoop for sourcing and transformations.
- Experience in data processing like collecting, aggregating, moving from various sources using Apache Flume and Kafka.
- Experience in HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios
- Experience working on Solr to develop search engine on unstructured data in HDFS.
- Used Solr to enable indexing for searching Non-primary key columns from Cassandra key spaces.
- Developed quality code adhering to Scala coding standards and best practices.
- Created User Defined Functions (UDFs), User Defined Aggregated Functions (UDAFs) in PIG, Hive.
- Real time exposure on Amazon Web Services, AWS command line interface and datapipeline.
- Integrated Apache Storm with Kafka to perform web analytics and to perform click streamdatafrom Kafka to HDFS.
- Written MapReduce programs in Java for data extraction, transformation and aggregation from various file formats like XML, JSON, CSV, Avro, Parquet, ORC, Sequence, txt etc.
- Worked on Talend ETL tool and used features like context variable and database components like input to oracle, output to oracle, tFile compare, tFile copy, to oracle close ETL components
- Created ETL Mapping with Talend Integration Suite to pull data from Source, apply transformations, and load data into target database.
- Well-versed with teh web technologies such as HTML, CSS and JavaScript.
- Good understanding of Data Mining and Machine Learning techniques.
- Experienced in all facets of Software Development Life Cycle (Analysis, Design, Development, Testing and maintenance) using Waterfall and Agile, Scrum methodologies.
TECHNICAL SKILLS
Bigdata Technologies: HDFS, Map Reduce, Pig, Hive, Sqoop, Oozie, Storm, Scala, Spark, Apache Kafka, Flume, Impala, Solr, Elastic Search, Ambari, Ab Initio
Database: Oracle 10g/11g, PL/SQL, MySQL, MS SQL Server 2012
SQL Server Tools: Enterprise Manager, SQL Profiler, Query Analyzer, SQL Server 2008, SQL Server 2005 Management Studio, DTS, SSIS, SSRS, SSAS
Language: C, C++, Java, Python, Scala
AWS Components: S3, EMR, EC2, Lambda, VPC, Route 53, Cloud Watch
Development Methodologies: Agile, Waterfall
Testing: Junit, Selenium Web Driver
NO-SQL Databases: HBase, Cassandra, MongoDB, Neo4j, Redshift, Redis
ETL Tools: Talend Open Studio, Pentaho, Tableau, Informatica
IDE Tools: Eclipse, NetBeans, Intellij
Modelling Tools: Rational Rose, StarUML, Visual paradigm for UML
Architecture: Relational DBMS, Client-Server Architecture
Cloud Platforms: AWS Cloud
Operating System: Windows 7/8/10, Vista, UNIX, Linux, Ubuntu, Mac OS X
PROFESSIONAL EXPERIENCE
Confidential, Chicago IL
Hadoop/Spark Developer
Responsibilities:
- Worked on Sqoop for loading data from different relational databases to HDFS.
- Developed Kafka producer and consumer components for real time data processing.
- Hands-on experience for setting up Kafka mirror maker for data replication across teh cluster’s.
- Experience in designing Kafka for multi data center cluster and monitoring it.
- Designed number of partitions and replication factor for Kafka topics based on business requirements.
- Implemented a distributed messaging queue to integrate with Cassandra using Apache Kafka and Zookeeper.
- Worked on physical transformations of data model which involved in creating Tables, Indexes, Joins, Views and Partitions.
- Involved in Analysis, Design, System architectural design, Process interfaces design, design documentation.
- Created Tableau visualization for teh internal management (client team) using Simba SparkSQL Connector.
- Collect teh data using SparkStreaming and dump into Cassandra Cluster.
- Performed importing data from various sources to teh Cassandra cluster using Sqoop. Worked on creating data models for Cassandra from Existing Oracle data model.
- Used Spark - Cassandra connector to load data to and from Cassandra.
- Performed unit testing forSparkand Spark Streamingwith Scala Test and Junit.
- Developed Pyspark, Scala code to cleanse and perform ETL on teh data in data pipeline in different stages.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, and Scala.
- Experienced with event-driven and scheduled AWS Lambda functions to trigger various AWS resources.
- Developed Pyspark code and Spark-SQL/Streaming for faster testing and processing of data.
- Used AWS to export MapReduce jobs into Spark RDD transformations.
- Developed ETL processes to transfer data from different sources, using Sqoop, Impala, and bash.
- Loaded and transform large sets of structured, semi structured using Hive andImpala
- Leveraged Tableau to perform visualizations on teh collected data.
- Set up Solr for distributing indexing and search and wrote Java code to format XML documents; upload them toSolrserver for indexing.
- Currently working in Spark and Scala for Data Analytics. Handle ETL Framework in Spark for writing data from HDFS to Hive.
- Worked with Kerberos and integrated it to teh Hadoop cluster to make it more strong and secure from unauthorized access.
- Involved in writing Java API for Amazon Lambda to manage some of theAWSservices.
- Extensively use Zookeeper as job scheduler for Spark Jobs.
- Developing predictive analytic using apache Spark, Scala APIs. Writing Scala classes to interact with teh database and used Scala based written framework for ETL.
- Worked on POC of Talend with Hadoop. Worked in improving performance of teh Talend jobs.
- Involved in automation of FTP process in Talend and FTPing teh Files in UNIX.
- Analyzed teh data by performing Hive queries on existing database. Designed and Implemented Partitioning (Static, Dynamic), Buckets in Hive.
- Used Pig to convert teh fixed width file to delimited file.
- Supported in setting up updating configurations for implementing scripts with Pig and Sqoop.
- Involved in Cassandra Data modelling to create key spaces and tables in multi Data Center DSE Cassandra DB.
Environment: Hadoop, Map Reduce, HDFS, AWS, Hive, Java, Eclipse, Apache Kafka, Pig, Linux, PL/SQL, Cassandra, Impala, Scala, Spark, Spark Streaming, Lambda, Sqoop, Solr, Agile, Talend, Cloudera CDH .
Confidential, Chicago IL
Hadoop/Spark Developer
Responsibilities:
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS.
- Experience on working with Hortonworks distribution.
- Experienced in using teh Spark application master to monitor teh Spark jobs and capture teh logs for teh spark jobs.
- Implemented Spark using Scala and Spark SQL for faster testing and processing of data and Used Spark transformations for Data Wrangling and ingesting teh real-time data of various file formats
- Worked on loading CSV/TXT/AVRO/PARQUET files using Scala/Java language in Spark Framework and process teh data by creating Spark Data frame and RDD and save teh file in parquet format in HDFS to load into fact table using ORC Reader
- Experienced with Spark Context, Spark-SQL, Data Frame, Datasets, Spark YARN.
- Knowledge in using NiFi to automate teh data movement between different Hadoop systems.
- Collected data using Spark Streaming from AWS S3 bucket in near-real-time and performs necessary Transformations and Aggregations.
- Enhanced HIVE queries performance using TEZ for Customer Attribution datasets.
- Analyzed teh data by performing Hive queries and running Pig scripts and developed Simple to complex Map/reduce Jobs using Hive, Pig and Python.
- Extending Hive and Pig core functionality by writing custom UDFs.
- Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Developed data pipeline using Pig and Hive from Teradata, DB2 data sources. These pipelines had customized UDF'S to extend teh ETL functionality.
- Strong hands on experience using Teradata utilities (SQL, B-TEQ, Fast Load, MultiLoad, FastExport, Tpump, Visual Explain, Query man, WinDDI).
- Developed hive queries and UDFS to analyze/transform teh data in HDFS. Developed hive scripts for implementing control tables logic in HDFS.
- Designed and Implemented Partitioning (Static, Dynamic), Buckets in HIVE.
- Developed Pig scripts and UDF's as per teh Business logic.
- Implemented functionalities using Mahout to display teh products best-suited user profiles by performing Sentiment Analysis of teh products and performed Trend Analysis
- Configured Flume to extract teh data from teh web server output files to load into HDFS.
- Performed de-normalization of data using PIG scripts.
- Worked on No-SQL databases like Hbase, MongoDB for POC purpose in storing images and URIs.
- Performed data analysis with HBase using Hive External tables. Exported teh analyzed data to HBase using Sqoop and to generate reports for teh BI team.
- Worked on MongoDB for distributed storage and processing.
- Importing teh data from relational database to Hadoop cluster by using Sqoop.
- Developed Hive queries to process teh data and generate teh data cubes for visualizing.
- Responsible for building scalable distributed data solutions using Hadoop.
- Monitored Hadoop cluster job performance and capacity planning. Providng teh architectural design to Business users.
- Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
- Installed Oozie workflow engine to automate Map/Reduce jobs.
- Building teh Hadoop cluster and sizing teh cluster based on teh data which extracted from all teh sources.
Environment: Java,Hadoop, Python,Map Reduce, Hive, Pig, Hortonworks, Flume, Nifi, Sqoop,HBase, MongoDB, Spark.
Confidential, Eastpointe, MI
Hadoop Developer
Responsibilities:
- Involved in understanding business requirements and prepare a design documents, coding, testing.
- Extracted data using Sqoop Import query from multiple databases and ingest into Hive tables.
- Involved in developing teh Spark Framework to provide structure to data on teh fly and process teh data using Spark core API's, Data Frame, SparkSQL and Scala.
- Evaluated and improved application performance with Spark.
- Implemented new dimensions into Spark application upon on teh business requirements.
- Worked on streaming pipeline that uses Spark to read data from Kafka, transform it and write it to HDFS.
- Involved in working with Spark on top of YARN for interactive and batch analysis.
- Implemented POC on writing programs in Scala using Spark and worked on migrating MapReduce programs into Spark using Scala.
- Implemented Spark SQL to update queries based on teh business requirements.
- Convert MR algorithms into Spark transformations and actions by creating RDDs, pair RDDs.
- Developed MapReduce/ EMR jobs to analyze teh data and provide heuristics and reports. Teh heuristics were used for improving campaign targeting and efficiency.
- Used cloud computing on teh multi-node cluster and deployed Hadoop application on cloud S3 and used Elastic Map Reduce (EMR) to run a Map-reduce.
- Hands on expertise in running teh Spark &Spark SQL on Amazon Elastic MapReduce (EMR).
- Initiated spark context and processed live streaming information with teh halp of RDD using Kafka and Kafka brokers.
- Developed data pipelines using Kafka, Spark and Hive to ingest, transform and analyze logs.
- Managed and reviewed Hadoop and HBase log files. Worked on HBase in creating HBase tables to load large sets of semi structured data coming from various sources.
- Auto Populate HBase tables with data coming from Kafka sink.
- Using SparkSQL allowing to teh data from different formats it's currently in (like JSON, Parquet, a Database), transform it, and expose it for ad-hoc/interactive querying.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in HDFS.
- Proficiency in Spark for loading data from teh local file system, HDFS, Relational databases and using SparkSQL, import data into RDD and Ingesting data from a range of sources using Spark Streaming with JSON output and loading them to HDFS, text file.
- Integrated various business validation & prep flag rules from campaign managers and business analyst on teh data.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Spark SQL and Scala extracted large datasets from Cassandra and Oracle servers into HDFS and vice versa using Sqoop.
- Deployed Spark application and Java web services in pivotal cloud foundry.
- Used AUTOMIC job scheduler for scheduling multiple applications in Production.
- Configured Spark streaming to receive real time data from teh Kafka and store teh stream data to HDFS using Scale.
- Handled Prod Deployments and provided production support for fixing teh defects.
- Strong communication and analytical skills and a demonstrated ability to handle multiple tasks as well as work independently or in a team.
Environment: Apache Hadoop, Apache Spark, Apache Kafka, Apache Hive, Sqoop, Cassandra, EMR, Spring Boot, Pivotal Cloud Foundry, JUnit, Angular JS, IntelliJ, Maven, Team City, Automic and Git Hub.
Confidential, Summit, NJ
Java/ Hadoop Developer
Responsibilities:
- Installed and configured Hadoop Map Reduce, HDFS, developed multiple Map Reduce jobs in java for data cleaning and preprocessing.
- Migrated existing SQL queries to HiveQL queries to move to big data analytical platform.
- Responsible to manage data coming from different sources.
- Created Hive external tables and managed tables, designed data models in Hive.
- Supported Map Reduce Programs those are running on teh cluster.
- Involved in loading data from UNIX file system to HDFS.
- Load and transform large sets datainto HDFS using Hadoop fs commands.
- Scheduled Oozie workflow engine to run multiple Hive and Pig jobs, which independently run with time and data availability.
- Implemented UDFS, UDAFS in java and python for hive to process teh data that can’t be performed using Hive inbuilt functions.
- Did various performance optimizations like using distributed cache for small datasets, partition and bucketing in hive, doing map side joins etc.
- Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop for analysis, visualization and to generate report.
- Involved in writing optimized Pig Script along with developing and testing Pig Latin Scripts
- Supported in setting up updating configurations for implementing scripts with Pig and Sqoop.
- Designed teh logical and physical data modeling wrote DML scripts for Oracle 9i database.
- Used Hibernate ORM framework with Spring framework for data persistence.
- Wrote test cases in Junit for unit testing of classes.
- Involved in templates and screens in HTML and JavaScript.
Environment: Java, HDFS, Cassandra, Map Reduce, Sqoop, JUnit, HTML, JavaScript, Hibernate, Spring, Pig, Hive.
Confidential
Java Developer
Responsibilities:
- Played an active role in teh team by interacting with welfare business analyst programspecialists and converted business requirements into system requirements.
- Developed analysis level documentation such as Use Case, Business Domain Model, Activity & Sequence and Class Diagrams.
- Conducted Design reviews and Technical reviews with other project stakeholders.
- Implemented Services using Core Java.
- Developed and deployed UI layer logics of sites using JSP.
- Struts (MVC) is used for implementation of business model logic.
- Worked with Struts MVC objects like Action Servlet, Controllers, and validators, Web Application Context, Handler Mapping, Message Resource Bundles and JNDI for look-up for J2EE components.
- Developed dynamic JSP pages with Struts.
- Used built-in/custom Interceptors and Validators of Struts.
- Developed teh XML data object to generate teh PDF documents and other reports.
- Used Hibernate, DAO, and JDBC for data retrieval and medications from database.
- Messaging and interaction of Web Services is done using SOAP.
- Developed JUnit Test cases for Unit Test cases and as well as System and User test scenarios.
- Involved in Unit Testing, User Acceptance Testing and Bug Fixing.
- Implemented mid-tier business services to integrate UI requests to DAO layer commands.
Environment: J2EE, JDBC, Java 1.4, Servlets, JSP, Struts 1.2, Hibernate, Web services, SOAP, WSDL, Design Patterns, MVC, HTML, JavaScript 1.2, WebLogic 8.0, XML, Junit, Oracle 10g, My Eclipse
