Sr. Bigdata Developer Resume
Irving, TX
SUMMARY:
- Nearly 8+ years of proven, result - oriented track record in delivering IT Solutions using various technologies. Highly experienced in building ground-up NextGen Analytics/BI platform using cloud-native technologies like Cassandra, Hadoop, Apache Spark, Scala, SOLR, Kafka, SQOOP and Datastax Enterprise . Over this period has developed, delivered and supported various IT implementation in E-commerce and Banking sectors. Experienced in building high performing systems using RDBMS like Oracle and OracleERP systems.
- Experience in application development and design using emerging technologies like Hadoop, NoSQL, Big data and Java/J2EE Technologies. 4 years of strong working experience with Big Data and Hadoop Ecosystems.
- Comprehensive experience in Bigdata processing using ecosystem components (MapReduce, Pig, Hive, Sqoop, Flume, Spark, Kafka, HBase, oozy and Zookeeper).
- Worked extensively on installing and configuring Hadoop ecosystem components Hive, SQOOP, PIG, HBase, Zookeeper and Flume.
- Analyzing structured and unstructured data using Apache Hadoop API
- Good usage of Apache Hadoop along enterprise version of Cloudera and Hortonworks . Good Knowledge on MAPR distribution & Amazon’s EMR .
- Valuable experience on practical implementation of cloud-specific technologies including IAM, Amazon Cloud Services like Elastic Compute Cloud (EC2) , ElastiCache , Simple Storage Services (S3), Cloud Formation, Virtual Private Cloud (VPC), Route 53, Lambda, EBS .
- Work experience with cloud infrastructure like Amazon Web Services (AWS).
- Hands-on experience in provisioning and managing multi-tenant Cassandra cluster on public cloud environment - Amazon Web Services (AWS) - EC2, S3, Lambda, Route 53.
- Good knowledge on Amazon AWS concepts like EMR and EC2 web services which provides fast and efficient processing of Big Data.
- Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase, Oozie, Sqoop, Flume, Spark, with Cloudera distribution.
- Expertise on writingMapreduce programs for validating data.
- Worked with Hive data warehouse (creating tables, data distribution by implementing partitioning and bucketing, optimizing and writing Hiveql queries).
- Involved in converting Cassandra/Hive/SQL queries into Spark transformations using Spark RDD's in Scala and Python.
- Worked as administrator for Pig, Hive and Hbase which includes installing updates and patches.
- Conviva and MLlib from Spark are used for predictive intelligence, customer segmentation and for smooth maintenance in spark streaming.
- Developed complex Talend ETL jobs to migrate the data from flat files to database.
- In-Depth understanding of Spark Architecture including Spark Core, Spark SQL, Data Frames, Spark Streaming, Spark MLib.
- Extensively used Scala, Spark improving the performance and optimization of the existing queries in Hadoop and Hive using Spark context, Spark SQL (Dataframes and Datasets) and pair RDDs.
- Optimizing of existing algorithms in Hadoop using SparkContext, Spark-SQL, DataFrames and Pair RDD's.
- Experience in using Accumulator variables, Broad cast variables, RDD caching in Spark
- Good knowledge in end to end Data Security and governance within Hadoop platform usingKerberos etc.
- Hands on Experience in application developments like Java, RDBMS and UNIX Shell Scripting.
- Implemented Nifi flow topologies to perform cleansing operations before moving data into HDFS.
- Worked with the Apache Nifi flow to perform the conversion of Raw XML data into JSON, AVRO.
- Good understanding of MPP databases such as HP Vertica and Impala.
- Worked on administrating and configuring Hadoop cluster for Cloudera distribution.
- Knowledge on Yarn configuration.
- Good understanding in Cassandra & MongoDB implementation
- Experienced in implementing projects in Agile and Waterfall methodologies
- Good experience in creating and designing data ingest pipelines using technologies such as Apache Storm- Kafka.
- Excellent communication skills, interpersonal skills, problem solving skills and very good team player along with a can do attitude and ability to effectively communicate with all levels of the organization such as technical, management and customers.
- Hands on experience in developing the applications with Java, J2EE, J2EE - Servlets, JSP, EJB, SOAP, Web Services, JNDI, JMS, JDBC2, Hibernate, Struts, Spring, XML, HTML, XSD, XSLT, PL/SQL, Oracle10g and MS-SQL Server RDBMS
TECHNICAL SKILLS:
Big Data Ecosystem: Hadoop,MapReduce, Cloudera Distribution, YARN, MapReduce, Pig, Hive, Scala, Sqoop, Flume, Hbase, Cassandra, Oozie, Zookeeper, Ambari, mahout, MongoDB, Kafka, Spark, Impala.
Hadoop Distributions: Cloudera (CDH3, CDH4, and CDH5), Hortonworks, MapR and Apache
Operating system: Windows, Linux, Unix. languages
Java, J2EE, SQL, Python, Scala, XML and C/C++: Databases
Oracle, SQL server, Mysql, IBM DB2: Web technologies
JSP, Servlets, HTML, CSS, JDBC, SOAP, XSLT: IDE
IBM RAD, Eclipse, Intellij, Netbeans: Tools: TOAD, SQL developer, ANT.
Methodology: Agile and waterfall
PROFESSIONAL EXPERIENCE:
Confidential, Irving, TX
Sr. Bigdata Developer
Responsibilities:
- Designed batch processing jobs using Apache Spark to increase speeds by ten-fold compared to that of MR jobs.
- Active member for developing POC on streaming data using Apache Kafka and Spark Streaming.
- Generated Java APIs for retrieval and analysis on No-SQL database such as HBase and Cassandra.
- Loading data to and from Cassandra using Spark - Cassandra connector.
- Used Spark Streaming APIs to perform transformations and actions on the fly for building common learner data model which gets the data from Kafka in near real time and persist it to Cassandra .
- Used Spark - Cassandra connector to load data to and from Cassandra.
- Good experience of AWS Elastic Block Storage (EBS), different volume types and use of various types of EBS volumes based on requirement.
- Worked and learned a great deal from Amazon Web Services (AWS) Cloud services like EC2, S3, EBS, RDS and VPC.
- Experience in AWS EC2, configuring the servers for Auto scaling and Elastic load balancing.
- Implemented AWS provides a variety of computing and networking services to meet the needs of applications
- Active member for developing POC on streaming data using Apache Kafka and Spark Streaming.
- Developed a data pipeline using Kafka to store data into HDFS.
- Developed multiple Kafka Producers and Consumers by using low level and high-level API's.
- Worked on Big Data Integration and Analytics based on Hadoop, Spark, Kafka and web methods technologies.
- Expert in implementing advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala
- Implemented PySpark and Spark SQL for faster testing and processing of data
- Worked on the core and Spark SQL modules of Spark extensively
- Developed MapReduce programs to parse the raw data and create intermediate data which would be further used to be loaded into Hive portioned data.
- Designed Cassandra data schema, implement real time data pipelines of Kafka messaging
system and Flink streaming layer sink to Cassandra
- Responsible for handling different data formats like Avro, Parquet and ORC formats and different Compression Codecs (GZIP, SNAPPY, LZO).
- Configuring AWS EC2 instances in VPC network & managing security through IAM and Monitoring server’s health through Cloud Watch.
- Tested Apache TEZ, an extensible framework for building high performance batch and interactive data processing applications, on Pig and Hive job.
- Used WEB HDFS REST API to make the HTTP GET, PUT, POST and DELETE requests from the webserver to perform analytics on the data lake.
- Used codec's like snappy and LZO to store data into HDFS to improve performance.
- Experience with Hortonworks distribution.
- Used Hortonworks AMBARI for job browser, file browser, running hive and impala queries.
- Collected XML and JSON data from different Sources and developed Spark APIs that helps to do inserts and updates in Hive tables and made data available in Hive and Impala as per business requirement.
- Worked on customizing Map Reduce code in Amazon EMR using Hive, Pig.
- Involved in daily SCRUM meetings to discuss the development/progress and was active in making scrum meetings more productive.
Environment: Hadoop, java, AWS, MapReduce, HDFS, Hive, Pig, Spark, Scala, Kafka, MySQL, Cassandra, HBase, Java, Eclipse, SQL scripting, Linux shell scripting.
Confidential, Austin, TX
Sr.Hadoop developer
Responsibilities:
- Exploring with Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame, pair RDD's, SparkYARN.
- Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data
- Used Spark SQL and Spark Streaming to process structured and streaming data.
- Load the data into SparkRDD and performed in-memory data computation to generate the output response.
- Developed Spark scripts by using Scala shell commands as per the requirement.
- Worked on MongoDB for distributed storage and processing.
- Responsible for using Flume sink to remove the data from Flume channel and to deposit in No-SQL database like MongoDB.
- Extracted BSON files from MongoDB and placed in HDFS and processed.
- Implemented collections & Aggregation Frameworks in MongoDB.
- Implemented B Tree Indexing on the data files which are stored in MongoDB.
- Implemented Flume NG MongoDB sink to load the JSON - styled data into MongoDB.
- Good knowledge in using MongoDB CRUD (Create, Read, Update and Delete) operations.
- Experience in custom aggregate functions using Spark SQL and performed interactive querying.
- Cassandra implementation using Datastax, JavaAPI .
- Worked on Clouderadistribution and deployed on AWSEC2 Instances.
- Hands on experience on ClouderaHue to import data on to the graphical User Interface.
- Experienced in querying HBase using Impala.
- Performed analysis on implementing Spark using Scala.
- Used Spark-StreamingAPIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time.
- Collected data using SparkStreaming from AWSS3 bucket in near-real-time and performs necessary Transformations and Aggregations to build the data model and persists the data in HDFS
- Deployed the project on AmazonEMR with S3 connectivity for setting a backup storage.
- Implemented usage of AmazonEMR for processing Big Data across a Hadoopcluster of virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3 ).
- Worked on Production Server's on Amazon Cloud (EC2, EBS, S3, Lambda and Route53)
Environment: Hortonworks Hadoop, HDFS, Hive, MapReduce, Spark, Java, HBase, Aws Cloud, Pig, Sqoop, Shell Scripts, MongoDB Oozie, MySQL, ETL, Talend, Kafka.
Confidential, Washington DC.
Hadoop ETL developer
Responsibilities:
- Installed and configured Flume, Hive, Pig, Sqoop and Oozie on the Hadoop cluster.
- Collected and aggregated large amounts of web log data from different sources such as webservers, mobile and network devices using Apache and stored the data into HDFS for analysis.
- Installed and configured Hadoop MapReduce, HDFS, developed multiple Map Reduce jobs in java for data cleaning and processing.
- Optimizing Hadoop MapReduce code, Hive/Pig scripts for better scalability, reliability and performance.
- Involved in loading data from UNIX/LINUX file system to HDFS.
- Involved in making Hive tables, stacking information, composing hive inquiries, producing segments and basins for enhancement.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Writing Hive DFs to extract data from staging tables.
- Worked on POC to bring data to HDFS and Hive.
- Experience in creating integration between Hive and HBase.
- Analyzed substantial data sets by running Hive queries and Pig scripts.
- Written Hive UDFs to sort Structure fields and return complex data type.
- Used distinctive data formats (Text format and ORC format) while stacking the data into HDFS.
- Maintained the server log document.
- Written Pig and Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data. Also have hand on Experience on Pig and Hive User Define Functions (UDF).
- Loaded all the data from the existing SQL Server to HDFS using Sqoop.
- Loaded cache data into HBase using Sqoop
Environment: MapReduce, HDFS, Hive, Flume, Pig, Python, MySQL, Oracle, Unix/Linux, Python.
Confidential, CA
Analytic Data engineer
Responsibilities:
- Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase.
- Deployed Hadoop Cluster in the following nodes.
- Managing and scheduling jobs on Hadoop cluster
- Manage Hadoop clusters: Monitor, maintain, setup.
- Involved in analyzing system failures, identifying root causes and recommended course of actions.
- Worked on Hive for exposing data for further analysis and for generating transforming files from different analytical formats to text files.
- Involved in loading data from LINUX file system to HDFS.
- Handled importing of data from various data sources, performed transformations using Hive andPigto load data into HDFS.
- Developed multiple MapReduce jobs in java for data cleaning and accessing.
- Developed Custom Input Formats in MapReduce jobs to handle custom file formats and to convert them into key-value pairs.
- Installed Hadoop, MapReduce, and HDFS and developed multiple MapReduce jobs in Pig and Hive for data cleaning and pre-processing.
- Implemented Name Node backup using NFS. This was done for High availability.
- Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
- Worked on custom Pig Loaders and Storage classes to work with a variety of data formats such as JSON, Compressed CSV, etc.
- Monitored workload, job performance and capacity planning using ClouderaManager.
- Created Hive External tables and loaded the data in to tables and query data using HQL.
- Analyzed the web log data using the HiveQL to extract number of unique visitors per day, page-views, visit duration, most purchased product on website.
- Converting the Oracle table components to Teradata Table Components in Abilities Graphs
- Implemented Fair schedulers on the Job tracker to share the resources of the Cluster for the Map Reduce jobs given by the users.
- Experience with CDH distribution and Cloudera Manager to manage and monitor Hadoop clusters
Environment: Hadoop, Map Reduce, HDFS, Pig, HBase, Hive, Java, SQL Server, Python, Linux, Cloudera.
Confidential
Java/J2EE developer
Responsibilities:
- Responsible for Coding Java and J2EE components using EJB, JSP, Servlets, JDBC API’s, XML, EJB, XSL/XSLT, HTML, DHTML, Collection Framework, MVC Frame Work, Java Script .
- Designed and implemented the training and reports modules of the application using Servlets , JSP and ajax
- Interact with Business Users and Develop Custom Reports based on the criteria defined.
Requirement gathering and information collection. Analysis of gathered information to prepare a detail work plan and task breakdown structure
- Extensively written CoreJava& Multi-Threading code in application
- Written JDBC statements, prepared statements, and callable statements in Java, JSPs and Servlets.
- Used EclipseIDE for all recoding in Java, Servlets and JSPs.
- Involved in discussions with the business analysts for bug validation and fixing.
- Consumed Web Services (WSDL, SOAP, and UDDI) from third party for authorizing payments to/from customers.
- Developed the DAO layer for the application using Hibernate and JDBC.
- Implemented Restful web services with JAX-RS (Jersey).
- Used JMS (Java Message Service) to send, receive and read messages in the application.
- Used databases like Oracle 10g, DB2 and wrote complex SQL statements, PL/SQL Procedures, Cursors to retrieve data from DB.
- Involving preparing Ant builds scripts (XML based), deployments, integration and configuration management of the entire application modules and performing unit testing using JUnit, system and integration testing of the whole application.
- Worked on Servlets, JSP, Struts, JDBC and Java script under MVC Architecture.
- Used extensively Eclipse and RAD in development and debugging the application.
- Used maven as a project build, dependency and management tool
- Involved in performance tuning where there was a latency or delay in execution of code
Environment: Java/J2EE, Eclipse, Web Logic Application Server, Oracle, JSP1, HTML, JavaScript, JMS, Servlets, UML, XML, Struts, Web Services, WSDL, SOAP, Junit.
Confidential
Java developer
Responsibilities:
- Also helped developing UML Diagrams: Use cases, Activity diagram, Sequence diagram, class Diagram.
- Developed JSPs and Servlets to dynamically generate HTML and display the data to the client side.
- Developed application on Struts MVCarchitecture utilizing Action Classes, Action Forms and validations
- Was responsible in implementing various J2EE Design Patterns like Service Locator, Business Delegate, Session Facade and Factory Pattern.
- Involved in the design and decision making for Hibernate OR Mapping.
- Developed Hibernate Mapping file (. hbm.xml) files for mapping declarations.
- Configured Queues in Web Logic server where the messages, using JMSAPI, were published.
- Writing/Manipulating the database queries, stored procedures for Oracle9i.
- Developed the services by following a full flown Test-Driven Development.
- Interacting with the system analysts, business users for design & requirement clarifications.
- Designed front end pages using Jsp, HTML, Angular JS, JQuery, JavaScript, Css and Ajax calls to get the required data from backend.
- Designed and developed this application using Spring MVC.
- Spring Framework IOC (Inversion of Control) design pattern is used to have relationships between application components.
- Used JIRA as a bug-reporting tool for updating the bug report.
Environment: Java, J2EE, Servlets, JSP, JDBC, JavaScript, oracle, eclipse RCP, Jira, Unix/Windows
