We provide IT Staff Augmentation Services!

Sr. Bigdata Architect/developer Resume

4.00/5 (Submit Your Rating)

Minneapolis, MN

SUMMARY:

  • Having 9+ years of professional IT experience in all phases of Software Development Life Cycle including hands on experience in Java/J2EE technologies and Big Data Analytics.
  • Having 4+ years of work experience in ingestion, storage, querying, processing and analysis of Big Data with hands on experience in Hadoop Ecosystem development including MapReduce, HDFS, Hive, Pig, Spark, Cloudera Navigator, Mahout, HBase, ZooKeeper, Sqoop, Flume, Oozie and AWS
  • Extensive experience working in Teradata, Oracle, Netezza, SQL Server and MySQL database
  • Excellent understanding and knowledge of NOSQL databases like MongoDB, HBase, and Cassandra.
  • Strong experience working with different Hadoop distributions like Cloudera, Hortonworks, MapR and Apache distributions.
  • Experience in installation, configuring, supporting and managing Hadoop Clusters using Apache, Cloudera (CDH 5.X) distributions and on Amazon web services (AWS)
  • Experience in Amazon AWS services such as EMR, EC2, S3, and Cloud Formation, Red shift which provides fast and efficient processing of Big Data.
  • In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, MR, Hadoop GEN2 Federation, High Availability and YARN architecture and good understanding of workload management, scalability and distributed platform architectures.
  • Good understanding of R Programming, Data Mining and Machine Learning techniques.
  • Strong experience and knowledge of real time data analytics using Storm, Kafka, Flume and Spark.
  • Experience in troubleshooting errors in HBase Shell, Pig, Hive and MapReduce
  • Involved in upgrading existing MongoDB instances from version 2.4 to version 2.6 by upgrading the security roles and implementing newer features.
  • Experience in extending HIVE and PIG core functionality by using custom UDF's and UDAF's and Responsible for performing reads and writes in Cassandra from and web application by using java JDBC connectivity.
  • Expertise in writing the Real - time processing application Using spout and bolt in Storm and debugging MapReduce jobs using Counters and MRUNIT testing.
  • Good experience and understanding of Spark Algorithms such as Classification, Clustering, and Regression and good understanding on Spark Streaming with Kafka for real-time processing.
  • Extensive experience working with Spark tools like RDD transformations, spark MLlib and sparkQL.
  • Experienced in moving data from different sources using Kafka producers, consumers and preprocess data using Storm topologies.
  • Experienced in migrating ETL transformations using Pig Latin Scripts, transformations, join operations and knowledge of data warehousing and ETL tools like Talend and Pentaho.
  • Good knowledge on streaming data from different data sources like Log files, JMS, applications sources into HDFS using Flume sources and good understanding of MPP databases such as HP Vertica, Greenplum and Impala.
  • Experienced in working with monitoring tools to check status of cluster using Cloudera manager, Ambari and Ganglia.
  • Extensive experience in middle-tier development using J2EE technologies like JDBC, JNDI, JSP, Servlets, JSF, Struts, Spring, Hibernate, EJB and worked on Docker based containerized applications.
  • Expertise in developing responsive Front End components with JSP, HTML, XHTML, JavaScript, DOM, Servlets, JSF, NodeJS, Ajax, JQuery and AngularJS.
  • Experience working on Version control tools like SVN and Git revision control systems such as GitHub and JIRA/MINGLE to track issues and crucible for code reviews.
  • Experience in different application servers like JBoss/Tomcat, Web Logic, and IBM WebSphere.

TECHNICAL SKILLS:

Big Data Ecosystem:: Hadoop, MapReduce, Pig, Hive, YARN, Kafka, Flume, Sqoop, Impala, Oozie, ZooKeeper, Spark, Solr, Storm, Drill, Ambari, Mahout, MongoDB, Cassandra, Avro, Parquet and Snappy.

Hadoop Distributions: Cloudera, MapR, Hortonworks

Languages:: Java, Scala, Python, SQL, HTML, JavaScript and C/C++

No SQL Databases:: Cassandra, MongoDB and HBase

Java Technologies:: Servlets, JavaBeans, JSP, JDBC, JNDI, EJB and struts

Web Design Tools:: HTML, JavaScript, JQuery and CSS and AngularJs

Development/Build Tools:: Eclipse, Ant, Maven, Gradle, IntelliJ, JUNIT and log4J.

Frameworks:: Struts, spring and Hibernate

App/Web servers:: WebSphere, Web Logic, JBoss and Tomcat

DB Languages:: MySQL, PL/SQL, PostgreSQL and Oracle

RDBMS:: Teradata, Oracle 9i, 10g, 11i, MS SQL Server, MySQL and DB2

Operating systems: UNIX, LINUX, Mac OS and Windows Variants

ETL Tools:: Tableau, Talend

PROFESSIONAL EXPERIENCE:

Sr. BigData Architect/Developer

Confidential, Minneapolis MN

Responsibilities:

  • Involved in importing and exporting data between Hadoop Data Lake and Relational Systems like Oracle, MySQL, using Sqoop.
  • Worked on Creating Kafka topics, partitions, writing custom partitioners classes and writing Spark Applications in Scala and Python (Pyspark)
  • Extracting real time data using Kafka and Spark streaming by Creating D streams and converting them into RDD, processing it and stored it into Cassandra.
  • Worked in building Real-time Data Pipelines with Kafka Connect and Spark Streaming and imported Avro files using Apache Kafka and did some analytics using Spark in Scala.
  • Configured, deployed and maintained multi-node Dev and Test Kafka Clusters and processed and transferred the data from Kafka into HDFS through Spark Streaming APIs.
  • Used highly available AWS Environment to launch the applications in different regions and implemented Cloud Front with AWS Lambda to reduce latency.
  • Using Spark-Streaming APIs to perform transformations and actions on fly for building the common learner data model which gets the data from Kafka in near real time and persists into Cassandra.
  • Used DataStax Spark-Cassandra connector to load data into Cassandra and used CQL to analyze data from Cassandra tables for quick searching, sorting and grouping.
  • Developed script which will Load the data into Spark Data frames and do in memory data computation to generate the output response.
  • Involved in migrating map reduce jobs into RDD (Resilient data distributions) and create Spark jobs for better performance.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data and used Scala sbt to develop Scala coded spark projects and executed using spark-submit.
  • Building the Cassandra nodes using AWS & setting up the Cassandra cluster using Ansible automation tools
  • Worked and learned a great deal from Amazon Web Services (AWS) cloud services like EC2, S3, EMR, EBS, RDS and VPC.
  • Developed Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
  • Developed Oozie Bundles to Schedule Pig, Sqoop and Hive jobs to create data pipelines and used ORC, Avro, Parquet, RCFile and JSON file formats and developed UDFs using Hive and Pig.
  • Developed Hive queries to do analysis of the data and to generate the end reports to be used by business users and written extensive Hive queries to do transformations on the data to be used by downstream models.
  • Used spark and spark-SQL to read the parquet data and create the tables in hive using the Scala API.
  • Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream the log data from servers/sensors
  • Experience in writing and tuning extensive Impala queries and creating views for Adhoc and business processing.
  • Design solution for various system components using Microsoft Azure.
  • Written generic extensive data quality check framework to be used by the application using impala.
  • Generated various marketing reports using Tableau with Hadoop as a source for data.
  • Worked on migrating MapReduce programs into Spark transformations using Spark and Scala, initially done using python (Pyspark)
  • Involved in ingesting data into Cassandra and consuming the ingested data from Cassandra to Hadoop Data Lake.
  • Involved in the process of Cassandra data modelling and building efficient data structures and written storm topology to emit data into Cassandra DB.
  • Understanding of Kerberos authentication in Oozie workflow for Hive and Cassandra.
  • Developed complex Talend ETL jobs to migrate the data from flat files to database.
  • Extensively used GIT as a code repository and Version One for managing day agile project development process and to keep track of the issues and blockers.

Environment: Hadoop, Hive, Impala, Oracle, Spark, Python, Pig, Sqoop, Oozie, Map Reduce, GIT, HDFS, Cassandra, Apache Kafka, Storm, Linux, Tableau, Solr, Confluence, Jenkins, Jira, AWS S3, EMR, Redshift, Apache Nifi, Pyspark.

Sr. Hadoop Developer/Architect

Confidential, Dallas TX

Responsibilities:

  • Research and recommend suitable technology stack for Hadoop migration considering current enterprise architecture.
  • Extensively used Spark stack to develop preprocessing job which includes RDD, Datasets and Data frames Api'sto transform the data for upstream consumption.
  • Developed Real time data processing applications by using Scala and Python and implemented Apache Spark Streaming from various streaming sources like Kafka, Flume and JMS.
  • Worked on extracting and enriching HBase data between multiple tables using joins in spark and worked on writing APIs to load the processed data to HBase tables
  • Built on premise data pipelines using Kafka and Spark streaming using the feed from API streaming Gateway REST service and replaced the existing MapReduce programs into Spark application using Scala.
  • Developed the Hive UDF's to handle data quality and create filtered datasets for further processing
  • Wrote Sqoop scripts to import data into Hive/HDFS from RDBMS.
  • Configure a number of node (Amazon EC2 spot Instance) Hadoop cluster to transfer the data from Amazon S3 to HDFS and HDFS to AmazonS3 and also to direct input and output to the Hadoop MapReduce framework.
  • Written python scripts for internal testing which pushes the data reading form a file into Kafka queue which in turn is consumed by the Storm application.
  • Implemented logging framework - ELK stack (Elastic Search, LogStash Kibana) on AWS and setup Spark EMR to process huge data which is stored in Amazon S3.
  • Developed Oozie workflow for scheduling & orchestrating the ETL process.
  • Used Talend tool to create workflows for processing data from multiple source systems and created sample flows in Talend, Stream sets with custom coded jars and analyzed the performance of Stream sets and Kafka steams.
  • Wrote Hive queries for data analysis to meet the business requirements. Implemented Kafka Custom encoders for custom input format to load data into Kafka Partitions. Real time streaming the data using Spark with Kafka for faster processing.
  • Developed fully customized framework using python, shell script, Sqoop & hive and developed export framework using python, Sqoop, Oracle & MySQL.
  • Developed Hive Queries to analyze the data in HDFS to identify issues and behavioral patterns.
  • Involved in writing optimized Pig Script along with developing and testing Pig Latin Scripts.
  • Implemented Kafka Custom encoders for custom input format to load data into Kafka Partitions. Real time streaming the data using Spark with Kafka for faster processing.
  • Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
  • Able to use Python Pandas, NumPy modules for Data analysis, Data scraping and parsing.
  • Deployed applications using Jenkins framework integrating Git- version control with it.
  • Participated in production support on a regular basis to support the Analytics platform and used Rally for task/bug tracking.

Environment: MapR, Hadoop, HBase, HDFS, Pyton, AWS, S3, EMR, PIG, Hive, Drill, Spark Sql, MapReduce, Spark streaming, Kafka, Flume, Sqoop, Oozie, Jupyter Notebook, Docker, Kafka, Spark, Scala, HBase, Talend, Shell Scripting, Java.

Sr. Hadoop Developer

Confidential - Omaha, NE

Responsibilities:

  • Involved in file movements between HDFS and AWS S3 and extensively worked with S3 bucket in AWS.
  • Experience in creating batch and real-time pipelines using Spark as the main processing framework.
  • Worked on the large-scale Hadoop Yarn cluster for distributed data processing and analysis using Spark, Hive, and HBase.
  • Migrated an existing on-premises application to AWS and used AWS services like EC2 and S3 for small data sets processing and storage, experienced in maintaining the Hadoop cluster on AWS EMR.
  • Collected JSON data from HTTP source and developed Spark API's that helps to do inserts and updates in Hive tables.
  • Experience in Cloudera Hadoop Upgrades and Patches and Installation of Ecosystem Products through Cloudera manager along with Cloudera Manager Upgrade.
  • Developed optimal strategies for distributing the web log data over the cluster importing and exporting the stored web log data into HDFS and Hive using Sqoop.
  • Used Amazon cloud-watch to monitor and track resources on AWS.
  • Worked on migrating MapReduce programs into Spark transformations using Spark with Scala.
  • Workedwith Apache Spark which provides fast and general engine for large data processing integrated with functional programming language Scala.
  • Developed and designed automation framework using Python and Shell scripting and involved in writing Java API for Amazon Lambda to manage some of the AWS services.
  • Worked in designing the reporting application that uses the Spark SQL to fetch and generate reports on HBase.
  • Extensively used Spark SQL, Pyspark API's for querying and transformation of data residing in Hive and implemented spark sample programs in python using Pyspark.
  • Responsible for developing the data pipeline using Sqoop, Flume and Pig to extract data from weblogs and store in HDFS.
  • Experience in loading D-Stream data into Spark RDD and did in-memory data computation to generate output response.
  • Experience in handling continuous streaming data which comes from different sources using Flume and set the destination as HDFS.
  • Worked on designing and developing ETL workflows using Java for processing data in HDFS/HBase using Oozie and worked in loading Data into HBase using Bulk Load and Non-bulk load.
  • Experience in using JIRA for bug tracking, CVS for version control.
  • Hands on experience on loading the Created HFiles into HBase for faster access of large customer base without taking performance hit.
  • Used Zookeeper to coordinate the servers in clusters and to maintain the data consistency.
  • Involve in using OOZIE operational services for batch processing and scheduling workflows dynamically.
  • Worked with SCRUM team in delivering agreed user stories on time for every Sprint.

Environment: AWS (EMR, EC2, S3), Cloudera, MapReduce, Pig, Hive, Sqoop, Flume, Pyspark, Spark, Scala, Java, HBase, Apache Avro, Oozie, Zookeeper, Elastic Search, Kafka, Python, JIRA, CVS and Eclipse.

Sr. Java/Hadoop Developer

Confidential - New York, NY

Responsibilities:

  • Coded front-end components using HTML, JavaScript and jQuery, Back End components using Java, spring, Hibernate, Services Oriented components using Restful and SOAP based web services, and Rules based components using JBoss Drools.
  • Involved in design and development phases of Software Development Life Cycle (SDLC)
  • Proficient in writing SQL queries, stored procedures for multiple databases, Oracle and SQL Server.
  • Involved in launching and Setup of HADOOP/ HBASE Cluster which includes configuring different components of HADOOP and HBASE Cluster.
  • Integrated Spring Dependency Injection among different layers of an application with spring and O/R mapping tool of Hibernate for rapid development and ease of maintenance.
  • Implemented AWS EC2, Key Pairs, Security Groups, Auto Scaling, ELB, SQS, and SNS using AWS API and exposed as the Restful Web services.
  • Wrote Stored Procedures using PL/SQL. Performed query optimization to achieve faster indexing and making the system more scalable.
  • Applied MapReduce framework jobs in java for data processing by installing and configuring Hadoop, HDFS.
  • Responsible for developing data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS
  • Worked on Designing and Developing ETL Workflows using Java for processing data in HDFS/Hbase using Oozie.
  • Worked with Struts MVC objects like Action Servlet, Controllers, validators, Web Application Context, Handler Mapping, Message Resource Bundles, Form Controller, and JNDI for look-up for J2EE components.
  • Implemented the Connectivity to the Database Server Using JDBC.
  • Developed the RESTful web services using Spring IOC to provide user a way to run the job and generate daily status report.
  • Developed and exposed the SOAP web services by using JAX-WS, WSDL, AXIS, JAXP and JAXB.
  • Configured domains in production, development and testing environments using configuration wizard.
  • Extensively involved in Installation and configuration of Cloudera distribution Hadoop 2, 3, NameNode, Secondary NameNode, JobTracker, TaskTrackers and DataNodes.
  • Created SOAP Handler to enable authentication and audit logging during Web Service calls and created Service Layer API's and Domain objects using Struts.
  • Used AJAX and JavaScript for validations and integrating business server-side components on the client side within the browser.
  • Used RESTFUL Services to interact with the Client by providing the RESTFUL URL mapping.
  • Implementing project using Agile SCRUM methodology, involved in daily stand up meetings and sprint showcase and sprint retrospective.
  • Developed user interface using JSP, JSP Tag libraries, and Java Script to simplify the complexities of the application.
  • Developed a Dojo based front end including forms and controls and programmed event handling.
  • Used XSLT to transform my XML data structure into HTML pages.
  • Deployed EJB Components on Tomcat. Used JDBC API for interaction with Oracle DB and developed the UI panels using JSF, XHTML, CSS, DOJO and jQuery.

Environment: Java 6 - JDK 1.6, JEE, Spring 3.1 framework, Spring Model View Controller (MVC), Java Server Pages (JSP) 2.0, Servlets 3.0, JDBC4.0, AJAX, Web services, Rest API, JSON, Java Beans, jQuery, JavaScript, Oracle 10g, JUnit, HTML Unit, XSLT, HTML/DHTML.

Java Developer

Confidential

Responsibilities:

  • Created the Database, User, Environment, Activity, and Class diagram for the project (UML
  • Implement the Database using Oracle database engine.
  • Designed and developed a fully functional generic n-tiered J2EE application platform the environment was Oracle technology driven. The entire infrastructure application was developer using Oracle JDeveloper in conjunction with Oracle ADF-BC and Oracle ADF- Rich Faces.
  • Created an entity object (business rules and policy, validation logic, default value logic, security)
  • Created View objects, View Links, Association Objects, Application modules with data validation rules (Exposing Linked Views in an Application Module), LOV, dropdown, value defaulting, transaction management features.
  • Web application development using J2EE: JSP, Servlets, JDBC, Java Beans, Struts, Ajax, JSF, JSTL, Custom Tags, EJB, JNDI, Hibernate, ANT, JUnit and Apache Log4J, Web Services, Message Queue (MQ)
  • Designing GUI prototype using ADF 11G GUI component before finalizing it for development.
  • Used Cascading Style Sheet (CSS) to attain uniformity through all the pages.
  • Create Reusable Component (ADF Library and ADF Task Flow)
  • Used Version controls such as CVS, PVCS, and Rational Clear Case and creating Modules Using Task Flow with Bounded and Unbounded.
  • Generating WSDL (Web Services) And Create Work Flow Using BPEL.
  • Handel the AJAX functions (partial trigger, partial Submit, auto Submit) and created the Skin for the layout.

Environment: Java core, Servlet, JSF, ADF Rich client UI Framework ADF-BC (BC4J) 11g, web services Using Oracle SOA (BPEl), Oracle Web Logic.

We'd love your feedback!