Sr. Big Data/cloud Developer Resume
Lake Forest, CA
SUMMARY:
- 8+ years of professional experience this includes Analysis, Design, Development, Integration, Deployment and Maintenance of quality softwareapplications using Java/J2EE Technologies and Hadoop technologies.
- Hands on experience in using various Hadoop distributions (Apache, Hortonworks, Cloudera, MapR).
- Experience in working wif Amazon EMR, Cloudera (CDH3, CDH4& CDH5) and Horton Works Hadoop Distributions.
- Expertise in Hadoop Ecosystem tools which including HDFS, Yarn, MapReduce, Pig, Hive, Sqoop, Flume, Kafka, Spark, Zookeeper and Oozie.
- Good knowledge in EMR (Elastic Map Reducing) to perform big data operations in AWS.
- Knowledge in working wif Amazon Web Services (AWS) using EC2 for computing and S3 as storage mechanism.
- Excellent understanding of Spark and its benefits in Big Data Analytics.
- Hands on experience in Stream processing frameworks such as Storm, Spark Streaming.
- Experience in design and develop the POC in Spark using Scala to compare the performance of Spark wif Hive and SQL/Oracle.
- Hand - on experience in using Scala, SparkStreaming, batch processing for processing theStreaming data and batch data.
- Implemented advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala.
- Experience in data analysis using HiveQL, Pig Latin and custom Map Reduce programs in Java.
- Hands-on experience on fetching the live stream data from DB2 to HBase table using Spark Streaming and ApacheKafka.
- Experienced working wif Hadoop Big Datatechnologies (hdfs and Mapreduce programs), Hadoop ecosystems (Hbase, Hive, pig) and NoSQL database MongoDB.
- Experience in queried and analyzed data from Cassandra for quick searching, sorting and grouping through CQL.
- Applied MachineLearning and performed statisticalanalysis on the data.
- Scraped and analyzed data using MachineLearning algorithms in Python and SQL.
- Experience using SparkDataStaxand Cassandra Connector load data to and from Cassandra.
- Experience on usage of NoSQL in writing applications like HBase, Cassandra and MongoDB.
- Extensive Experience on importing and exporting data using Flume and Kafka.
- Experience in configuring the Zookeeper to coordinate servers in clusters and to maintain data consistency.
- Expertise in loading the datafrom the different datasources like (Teradata and DB2) into HDFS using Sqoop and load into partitioned Hivetables.
- Experience in developing data pipeline by using Kafka to store the data into HDFS.
- Experience in migrating data by using SQOOPfrom HDFS to Relational Database System and vice-versa according to client's requirements.
- Used CassandraCQL wif Java API’s to retrieve the data from Cassandra tables.
- Worked wif NIFI for managing flow of data from source to HDFS.
- Good understanding on Linux/Linux Kernel Internals and debugging.
- Good Experience on source control repositories like CVS, GIT and SVN.
- Experience in working different scripting technologies like Python, UNIX shell scripts.
- Excellent Java development skills using J2EE, J2SE, Servlets, JSP, EJB, JDBC.
- Experience working wif Spring and Hibernates frameworks in JAVA.
- Experience in developing web page interfaces using HTML, JSP and Java Swings scripting languages.
- Used Spring Core Annotations for Dependency Injection Spring DI and Spring MVC for REST API’s and Spring Boot for micro-services.
- Good understanding and working experience on Cloud based architectures.
- Experience in handling various file formats like AVRO, Parquet, Sequentialetc.
- Experience in Object-Oriented Design, Analysis, Development, Testing and Maintenance.
- Expertise implementation knowledge of Enterprise/Web/Client Server using Java, J2EE.
- Expertise in Oracle ORMB and Stored procedures concepts.
- Expanding monolithic application into micro-services architecture.
- Good understanding and experience wif Software Development methodologies like Agile and Waterfall and performed Testing such as Unit, Regression, White-box, Black-box.
- Ability to work onsite and offshore team members.
TECHNICAL SKILLS:
Big Data Technologies: HDFS, MapReduce, YARN, Pig, Hive, Sqoop, Kafka, Flume, HBase, Cassandra, MongoDB, Spark, Solr, Ambari, Hue, Avro, Mahout, Impala, Oozie,Nifi and Zookeeper
Hadoop Distributions: Cloudera (CDH3, CDH4, and CDH5), Hortonworks, MapR and Apache
Database: MySQL, Oracle 10g/11g, PL/SQL, MS SQL Server 2012
No-SQL Database: HBase, Cassandra and MongoDB
Programming Languages: C, C++, Java, JavaScript, Python, Scala
Frameworks: Struts, Spring, Hibernate, Spring Boot, Micro-services
Operating System: Windows 7/8/10, Vista, Ubuntu, Linux, UNIX, Mac OS
Cloud Platforms: AWS Cloud, Google Cloud
Application Servers: Web Logic, Web Sphere, Tomcat
Architecture: Client-Server Architecture, Relational DBMS, OLAP, OLTP
Testing: Selenium Web Driver, Junit
Modelling Tools: Visual paradigm for UML, Rational Rose, StarUML
ETL Tools: Talend, Informatica, Tableau
IDE Tools: NetBeans, Eclipse, Intellij, Visual Studio Code
Built Tools: Maven, Jenkins
Development Methodologies: Waterfall, Agile/Scrum
PROFESSIONAL EXPERIENCE:
Confidential, Lake Forest, CA
Sr. Big Data/Cloud Developer
Responsibilities:
- Worked in AWS environment for development and deployment of Custom Hadoop Applications.
- Experience in maintaining the EC2 (Elastic Computing Cloud) and RDS (Relational Database Services) in amazon web services.
- Creating S3 buckets, also managing policies for S3 buckets and utilized S3 bucket and Glacier for storage and backup on AWS.
- Strong experience in using Amazon Atanato analyze data in Amazon S3 using standard SQL.
- Experience inmoving the data from the Amazon S3 bucket to the AWS Glue Data Catalog tan, we use the AWS Glue job, which influence theApache Spark Python API (pySpark), to transform the data from the Glue Data Catalog.
- AWS Glue job helps us move the transformed data to Amazon Redshift data warehouse.
- Designed the ETL process and created the high-level design document including the logical data flows, source data extraction process, the database staging and the extract creation, source archival, job scheduling and Error Handling.
- TEMPEffectively migrated data from different source systems to build a secure data warehouse.
- Built data analytics on Spark which increased the revenue of the business.
- Involved in worked wif integrate tools like Elastic Search wif existing source systems.
- Implemented big data workflows to ingest the data from various sources to Hadoop using OOZIE and these workflows comprises of heterogeneous jobs like Hive, Sqoop and Python Script.
- Implementing project using AgileSCRUM methodology, involved in daily stand up meetings.
Environment:AWS,Elastic Search, Hadoop, HDFS, Sqoop,Kafka, Hive, Oozie, Zookeeper, Spark-Core, Spark-SQL, Spark-Streaming, Scala, Python, and Visual Studio Code.
Confidential, King of Prussia, PA
Sr.Hadoop/SPARK Developer
Responsibilities:
- Involve in design and development phases of software development life cycle(SDLC) using Scrum methodology.
- Implementing advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala.
- Worked wif ELASTIC MapReduce(EMR) and setup Hadoop environment in AWSEC2 Instances.
- Storing data in AmazonS3 buckets using objects and created data pipeline by integrating KafkaSpark streaming to data repository (S3 buckets).
- Worked wif cloud services like Amazon Web Services(AWS) and involving in ETL, Dataintegration and Migration.
- Responsible for developing data pipeline wif Amazon AWS to extract the data from weblogs and store in HDFS.
- Worked wif NIFI for managing flow of data from sources through automateddata flow.
- Strong experience working wif Amazon AWS EC2 for accessing Hadoop cluster components.
- Imported data from AWSS3 into Spark RDD, Performed transformations and actions on RDD's.
- Experience wif Spark Context, Spark SQL, Data frames, RDD’S and YARN.
- Experience in using SparkStreaming API’s for performing transformations and actions on fly for building common learner data model which gets data from Kafka in near real-time and persist it to Cassandra.
- Used Spark API over Hadoop YARN as execution engine for data analytics using Hive.
- Experience in Query data using Spark SQL on the top of Spark Engine implementing Spark RDD’s in Scala.
- Experience in implementing Kafka Java producers and create custom partitions, configured brokers and implemented High level consumers to implement data platform.
- Developed Scala scripts using both Data frames/SQL/Datasets and RDD/MapReduce in Spark for Data aggregation, queries and writing data back into OLTP system through Sqoop.
- Used IntelliJ IDE for developing Scala scripts for Spark jobs.
- Developed Preprocessing job using SparkData frames to flatten JSONdocuments to flat file.
- Ingested Streaming data wif Apache NIFI into Kafka.
- Expertise in writing SparkRDD transformations, Actions, Data Frames, Case classes for the required input data and performed the data transformations usingSpark-Core.
- Developed Kafka consumer's API in Scala for consuming data from Kafka topics.
- Experience in performing advanced procedures like text analytics using in-memory computing capabilities of Spark using Scala.
- Good understanding of Cassandraarchitecture, replication strategy, gossip, snitch etc.
- Used Apache Kafka to aggregate web log data from multiple servers and make them available in Downstream systems for Data analysis and engineering type of roles.
- Creating Hive tables as per requirement were Internal (or) External tables are defined wif appropriate static/dynamic partitions and bucketing intended for efficiency.
- Experience in data modeling and connecting Cassandra from Spark and saving summarized data frame to Cassandra.
- Experienced in working wifSpark eco system usingSparkSQL and Scala queries on different formats like Text file, CSV file.
- Used Kafka functionalities like distribution, partition, replicated commit log service for messaging systems by maintaining feeds.
- Developed and deployed Apache NIFI flows across various environments, optimized Nifi data flows.
- Analyze large datasets to find patterns and insights wifin structured and unstructured data to help business wif the help of Tableau.
- Experience in collecting log data from web servers and pushed to HDFS using Flume from NoSQL DB's Cassandra.
- Proficient in NIFI and workflow scheduler managing Hadoop jobs by Direct Acyclic Graph (DAG) of actions wif control flows.
- Experience wif CDH5 distribution and Cloudera Manager to manage and monitor Hadoop clusters.
- Experience in manage and reviewing Hadoop log files.
- Used Kerberos and integrated it to Hadoop cluster to make it more strong and secure from unauthorized access.
- Experience using Jira for bug tracking and Bit Bucket to check-in and checkout code changes.
- Implemented the project by using Agile Methodology and attended Scrum Meetings daily.
Environment:AWS, Hadoop,Cloudera, YARN, HDFS, Sqoop, Cassandra, Spark-Core, Spark-SQL, Spark-Streaming, java, Scala, Python, Apache Flume, Kafka, Hive, Kerberos,Tableau, Nifi, Zookeeperand Intellij.
Confidential, Stamford, CT
Hadoop/Spark Developer
Responsibilities:
- Involved in file movements between HDFS andAWSS3 and extensively worked wif S3 bucket inAWS.
- Experience in creating batch and real-time pipelines using Spark as the main processing framework.
- Worked on the large-scale Hadoop Yarn cluster for distributed data processing and analysis using Spark, Hive, and HBase.
- Migrated an existing on-premises application to AWS and used AWS services like EC2 and S3 for small data sets processing and storage, experienced in maintaining the Hadoop cluster onAWSEMR.
- Collected JSON data from HTTP source and developed Spark API’s dat helps to do inserts and updates in Hive tables.
- Experience in Cloudera Hadoop Upgrades and Patches and Installation of Ecosystem Products through Cloudera manager along wif Cloudera Manager Upgrade.
- Developed optimal strategies for distributing the weblogdataover the cluster importing and exporting the stored web logdatainto HDFS and Hive using Sqoop.
- Used Amazon cloud-watch to monitor and track resources on AWS.
- Worked on migrating MapReduce programs into Spark transformations using Spark wif Scala.
- Experience in working wif Apache Spark which provides fast and general engine for large data processing integrated wif functional programming language Scala.
- Implemented spark sample programs inpythonusing pyspark.
- Experience in designing the reporting application dat uses the SparkSQL to fetch and generate reports on HBase.
- Extensively usedSparkSQL, PySparkAPI's for querying and transformation of data residing in Hive.
- Responsible for developing the data pipeline using Sqoop, Flume and Pig to extract data from weblogs and store in HDFS.
- Experience in loading D-Stream data into Spark RDD and did in-memory data computation to generate output response.
- Experience in handling continuous streaming data which comes from different sources using Flume and set the destination as HDFS.
- Experience in loading Data into HBase using Bulk Load and Non-bulk load.
- Experience in working on designing and developing ETL workflows using Java for processing data in HDFS/HBase using Oozie.
- Experience in using JIRA for bug tracking, CVS for version control.
- Hands on experience on loading the Created HFiles into HBase for faster access of large customer base wifout taking performance hit.
- Used Zookeeper to coordinate the servers in clusters and to maintain the data consistency.
- Involve in using OOZIE operational services for batch processing and scheduling workflows dynamically.
- Worked wif SCRUM team in delivering agreed user stories on time for every Sprint.
Environment:AWS (EMR, EC2, S3), Cloudera, MapReduce, Pig, Hive, Sqoop, Flume,Pyspark, Spark,Scala, Java,HBase, Apache Avro, Oozie, Zookeeper,Elastic Search, Kafka, Python, JIRA, CVS and Eclipse.
Confidential, BOSTON, MA
Java/Hadoop Developer
Responsibilities:
- Handled large amount of data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
- Wrote MapReduce jobs using Java API and Pig Latin.
- Experience in working wif Hadoop clusters using Hortonworks distributions.
- Launching and Setup of HADOOPCluster which includes configuring different components of HADOOP.
- Extracted and restructured the data into MongoDB using import and export command line utility tool.
- Developed MapReduce programs to clean and aggregate data.
- Involve in creating Hive tables and loading &analyzing data by using Hive queries.
- Scheduled jobs using Oozie workflow Engine.
- Developed Hive queries to process the data and generate the data cubes for visualizing.
- Writing UDF (User Defined Functions) in Pig, Hive when needed.
- Hands on experience in J2EE components on EclipseIDE.
- Expert knowledge on MongoDB NoSQL data modeling, tuning, disaster recovery backup used it for distributed storage and processing using CRUD.
- Involved in loading data from UNIX file system and FTP to HDFS
- Handled Avro Data files using Avro Tools and Map Reduce.
- Developed Custom Loaders and Storage Classes in PIG to work wif various data formats like JSON, XML, CSV etc.
- Implemented data serialization using Apache Avro.
Environment:Hortonworks, Apache Hadoop 1.0.1, HDFS, MapReduce, Java,Talend, Pig, Hive, Sqoop,J2EE, Flume, MongoDB, MYSQL, Apache Avro, Python, Avro, UNIX, Shell scripts and Eclipse.
Confidential, Thief River Falls, MN
Hadoop/Java Developer
Responsibilities:
- Monitoring the health of MapReduce programs which are running on cluster.
- Good knowledge about MapReduce Framework includes MR daemons, sorting and shuffle phase and task.
- Experience wif Cloudera Manager for management of Hadoopcluster.
- Hands on experience in exporting the results into relational databases using Sqoop for visualization and to generate reports for BI team.
- Custom talend jobs to ingest, enrich and distributedatain Cloudera Hadoop ecosystem.
- Involve in creating Map Reduce programs for some refined queries on big data.
- Implementing business logic by writing Pig and HiveUDF’s for some aggregative operations and to get the results from them.
- Creating the cube in talend to create different types of aggregation in the dataand also to visualize them.
- Developed Flume Agents for loading and filtering Streaming data into HDFS.
- Involve in using HCATALOG to access Hive table metadata from MapReduce or Pig Latin.
- Developed simple to complex MapReduce jobs using Hive and Pig.
- Worked wif NoSQL databases like HBase in creating HBase tables to load large sets of semi structured data coming from various sources.Created HBase tables and used HBase sinks and loaded data into them to perform analytics using Tableau.
- Developed Sqoop jobs to perform incremental imports into Hive tables.
- Moving bulk amount of data into HBase using MapReduce integration.
- Worked wif Oozie workflow engine to run multiple Hive jobs.
Environment:Cloudera, HDFS, MapReduce, Java, Pig, Hive, Sqoop, Tableau,Flume, Talend,HBase, MYSQL, Java, Apache Avro, Python, Oozie and Eclipse.
Confidential
Java Backend Developer
Responsibilities:
- Worked in complete SDLC phases like Requirements, Specification, Design, Implementation and Testing.
- Designed and developed a system framework using J2EE technologies based on MVC architecture.
- Used JUnit for testing UI frameworks.
- Involved working on developing profile view web pages add, edit using HTML, CSS, JQuery, JavaScript, AJAX, DHTML, JSP custom tags also front-end development.
- Optimized XML parsers like SAX and DOM for the production data.
- Developed the application by using MAVEN script.
- Experience on using Log4j for debugging.
- Client-side Validations are done using JavaScript.
- Expertise in implementing Struts MVC framework for developing J2EE web application.
- Developed Spring and Hibernate data layer components for application.
- Worked on SVN version controlling.
- Implemented session beans using EJB 2.0.
- Implemented validations using JavaScript for the fields on Login screen and registration page.
- Experience in developing web-based applications using Google Wen Toolkit(GWT) and J2EE servlet technology.
- Having good knowledge of JDBC connectivity.
- Developed the DAO layer for the application using Hibernate and JDBC.
- Designed and developed the application using Agile methodology and followed SCRUM.
Environment:HTML, CSS, JQuery, JavaScript, Angular JS, Java/J2EE, JDBC, Struts, Spring, Hibernate, Junit, SVN, Maven, Ajax, Apache CFX, Jenkins, Log4j, EJB, Agile, Scrum and Web Service.
Confidential
Java/J2EE Developer
Responsibilities:
- Experience in responsible for programming and troubleshooting web applications using HTML, CSS, JavaScript, Java, JSP, SQL server.
- Develop and maintain application UI based on Eclipse.
- Developed client-side validations using JavaScript and JQuery.
- Involved in Database design and developing SQLQueries, stored procedures on MySQL
- Experience on Spring integration for communicating wif business components and worked on spring wif Hibernate integration for ORM mappings.
- Deployed the application on WebSphere application server.
- Experience in Struts Action Servlet is used as Front Controller for redirecting the control to the specific J2EE component as per the requirement.
- Used MAVEN for project management and build automation and Continuous Integration is done using Jenkins.
- Involve in deploy the application using Tomcatwebserver.
- Helping UI team to integrate using Spring and RESTFUL services.
- Responsible for performing code reviewing and debugging.
Environment: HTML, CSS, JavaScript, JQuery, Java/J2EE, JSP, XML, MYSQL, Tomcat server, WebSphere, Spring, Hibernateand UNIX/WINDOWS.
