Senior Big Data Developer Resume
Durham, NC
PROFESSIONAL SUMMARY:
- My 8 years of IT industry experience includes 4 years of hands - on work with Big-Data Technologies.
- As a Hadoop and Spark Developer I got the opportunity to work with the Financial, Retail and Health-Care Sectors which involve huge data and thereby I was extensively dealing with Data Ingestion, Storage, QueryingData Processing and Data Analysis on large sets.
- Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop Cluster.
- Experience on Big Data Ecosystem using Hadoop framework and related technologies such as HDFS, HBase, Map Reduce, Hive, Pig, Flume, Oozie, Kafka, Sqoop, Zookeeper, YARN, Spark (PySpark & Spark-shell), Cassandra, NiFi
- Experience in Python, Scala, Java, SQL and Shell programming
- Capable of processing large sets of structured, semi-structured and unstructured data and supporting systems application architecture. Having good knowledge in Spark and Kafka.
- Hands on experience with Spark Core, Spark SQL, Spark Streaming using Scala and Python.
- Excellent understanding of Hadoop architecture and its components such as HDFS, Name Node, Data Node and MapReduce programming paradigm.
- Worked on Performance Tuning of Hadoop jobs by applying techniques such as Map Side Joins, Partitioning and Bucketing. Having good knowledge in NoSQL databases like MongoDB, Cassandra.
- Worked with Multiple File Formats like Avro, Parquet, CSV, JSON, Sequential, ORC etc.
- Experience utilizing Java tools in Business, Web, and Client-Server environments including Java, Jdbc, Servlets, Jsp, Struts Framework, Jasper Reports and SQL.
- Experienced in using Version Control Tools like Subversion, Git.
TECHNICAL SKILLS:
DATA INGESTION: Sqoop, Kafka, Flume, HDFS Commands
DATA PROCESSING: Spark, YARN, Hive, PIG, Map Reduce
DATA STORAGE: HBASE, HDFS, MongoDB
DATA VISUALIZATION: Tableau, Qlik View
LANGUAGES: C, C++, Python, Scala, Java, Shell, SQL
RELATIONAL DATABASES: MySQL, Oracle, SQL Server, IBM DB2
ETL: Talend, DataStage, ODI
MONITORING: Ambari, Cloudera Manager
DISTRIBUTIONS: Cloudera, Hortonworks
VERSION CONTROL, IDEs: GIT, SVN, Eclipse, IntelliJ
BUILD TOOLS: ANT, Maven, Gradle, SBT
SDLC Methodologies: Agile Methodology, Waterfall Model
CLOUD: AWS: EMR, EC2, S3, DynamoDB
PROFESSIONAL EXPERIENCE:
Senior Big Data Developer
Confidential, Durham, NC
Responsibilities:
- Design, Development and Testing of Apache Spark based Applications used for Data Loading
- Handling the installation, configuration, and integrating of any Big Data tools and frameworks required to provide requested capabilities
- Orchestrate Apache Nifi events to achieve Data Movement from Source to Target while undergoing the necessary transformations
- Work in accordance with company’s compliance policies to maintain Data Security and Privacy
- Salesforce, Google Analytics and Social media Integration using REST API’s and the end points available in them
- Perform various Data oriented operations like data extraction, data validation, data cleansing, data modelling and data loading
- Integrate third party frameworks and systems into DAP
- Design and develop Tableau reports depending on requirement to perform visualizations, trends, summary and detailed reports etc
- Create technical documentation for and reporting
Environment: AWS EMR, AWS S3, Spark, Hive, Sqoop, Kafka, Oozie, Spark Core, SPARK SQL, Maven, Scala, SQL, Linux, YARN, IntelliJ, Agile Methodology
Senior Hadoop Developer
Confidential, Trenton, NJ
Responsibilities:
- Ingested incremental Batch Data from MySQL database and Teradata to HDFS using Sqoop at scheduled intervals
- Involved in ingesting real time data to HDFS using Kafka and implemented the Oozie job for daily imports.
- Worked on Amazon Web Services (AWS) using Elastic map reduce (EMR) for data processing with S3 for storage.
- Used Different SparkModules like Sparkcore, SparkSQL, SparkStreaming, SparkData sets and Data frames.
- Involved in converting the files in HDFS into RDD's from multiple data formats and performing Data Cleansing using RRD Operations.
- Very good understanding of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
- Worked with various HDFS file formats like Avro, ORC, Sequence File and various compression formats like Snappy, gzip etc.
- Integrated Oozie with the rest of Hadoop stack supporting several types of jobs as well as the system specific jobs (such as Java programs and shell scripts).
Environment: HDFS, Spark, Hive, Sqoop, Kafka, AWS EMR, AWS S3, Oozie, Spark Core, SPARK SQL, Maven, Scala, SQL, Linux, YARN, IntelliJ, Agile Methodology
Spark & Hadoop Developer
Confidential, West Chester, PA
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop and Spark.
- Developing data ingestion pipelines using Sqoop and Kafka to ingest the database tables and streaming data into HDFS for analysis.
- Developing spark streaming application to receive the data streams from Kafka and process the continuous data streams and trigger actions based on fixed events.
- Teamed up with architects to design Spark streaming model for the existing Map Reduce model and migrating the Map Reduce models to Spark using Scala.
- Using Hive to analyze the partitioned and bucketed data and compute various metrics for creating dashboards in Tableau.
Environment: Hadoop - HDFS, Spark - SQL & streaming, Kafka, Sqoop, Hive, Core Java, Scala, Unix Shell Scripting, Oozie workflows, Ambari - Hortonworks, Informatica Power center.
Big Data Engineer
Confidential, Grapevine, TX
Responsibilities:
- Implemented Kafka consumers for HDFS and Spark Streaming
- Utilized SQOOP, Kafka, Flume and Hadoop File System API’s for implementing data ingestion pipelines from heterogenous data Sources
- Created storage with Amazon S3 for storing data. Worked on transferring data from Kafka topic into AWS S3 storage.
- Worked on real time streaming, performed transformations on the data using Kafka and Spark Streaming.
- Implemented Spark Scripts using Scala, Spark SQL to access hive tables into spark for faster processing of data.
- Created data pipeline for different events of ingestion, aggregation and load consumer response data from AWS S3 bucket into Hive external tables and generated views to serve as feed for tableau dashboards.
- Worked on various data formats like AVRO, Sequence File, JSON, Map File, Parquet and XML.
- Used Apache NiFi to automate data movement between different Hadoop components and perform conversion of raw XML data into JSON, AVRO.
Environment: Hadoop, HDFS, AWS, Scala, Kafka, MapReduce, YARN, Spark, Pig, Hive, Scala, Java, NiFi, HBase, IMS Mainframe, Maven.
JavaDeveloper
Confidential
Responsibilities:
- Developed the application using Struts Framework that leverages Model View Controller (MVC) architecture
- Implemented design patterns such Singleton, Factory pattern and MVC
- Deployed the applications on IBM WebSphere Application Server.
- Worked on Java script, CSS Style Sheet, Bootstrap, jQuery.
- Worked one-on-one with client to develop layout, color scheme for his website and implemented it into a final interface design with the HTML5/CSS3 & JavaScript using Dreamweaver.
- Used advanced level of HTML5, JavaScript, jQuery, CSS3 and pure CSS layouts (table less layout)
- Wrote SQL queries to extract data from the Oracle & MySQL databases.
- Involved in jUnit Testing for all test case scenarios and checking the functionality of the application
- Used CVS for version control across common source code used by developers.
Environment: Python, Java, Oracle 11g Express, CVS, Struts, Spring 3.0, HTML, CSS,JavaScript, Apache Tomcat, Eclipse IDE, REST, Maven, Junit
Junior Java Developer
Confidential
Responsibilities:
- Responsible for designing Rich user Interface Applications using JavaScript, CSS, HTML and AJAX.
- Applied J2EE Design Patterns such as Factory, Singleton, and Business delegate, DAO, Front Controller Pattern and MVC.
- Developed EJB component to implement business logic using Session and Message Bean.
- Excellent working experience with Oracle10g including storage and retrieving data using Hibernate.
- Building and Deployed the application in WebLogic Application Server.
- Developed and executed Unit Test cases using JUnit framework by supporting TDD.
- Provided extensive pre-delivery support using Bug Fixing and Code Reviews.
Environment: J2EE, JDK 1.5, Spring 2.5, Struts 1.2, JSP, Servlets, EJB 3.0, Hibernate 3.0, Oracle 10g, PL/SQL, CSS, Ajax, HTML, java script, Log4j, JUnit, SOAP, Webservices.