Aws/ Hadoop Developer Resume
Sfo, CA
SUMMARY:
- Over 8+ years of IT experience in Analysis, Design, Development and in Scala, Spark, Hadoop and HDFS environment and experience in JAVA, J2EE.
- Experienced in developing and Implementing MapReduce programs using Hadoop to work as per the requirement.
- Excellent experience on Scala, Apache Spark, Spark Streaming, Pattern Matching and Map - Reducing.
- Developed ETL test scripts based on technical specifications/Data design documents and source to target mappings.
- Ability to build deployment on AWS, build scripts (Boto 3 & AWS CLI) and automated solutions using Shell and Python.
- Implemented AWS solutions using EC2, S3, RDS, EBS, Elastic Load Balancer, Auto scaling groups, AWS CLI.
- Experienced in installing, configuring, and administrating Hadoop cluster of major Hadoop distributions Confidential , Confidential .
- Created and configured new batch job in Denodo scheduler with email notification capabilities.
- Wrote scripts to automate data load and performed data transformation operations.
- Implemented Cluster setting for multiple Denodo node and created load balance for improving performance activity.
- Worked with diligence team to explore whether NIFI was a feasible option to our solution.
- Experienced in working with different data sources like Flat files, Spreadsheet files, log files and Databases.
- Experienced in working with flume to load the log data from multiple sources directly into HDFS.
- Excellent experience in Apache Hadoop ecosystem components like Hadoop Distributing File System (HDFS), Map Reduce, Sqoop, Apache Spark and Scala.
- Extensive experience working in Oracle, DB2, SQL Server and MySQL database and Java Core concepts like OOPS, Multithreading, Collections and IO.
- Experience with Oozie Workflow Engine in running workflow jobs with actions that run Hadoop Map-Reduce and Pig jobs.
- Experience with MapReduce, Pig, Programming Model, Installation and Configuration of Hadoop, HBase, Hive, Pig, Sqoop and Flume using Linux commands.
- Experience in managing and reviewing Hadoop Log files using FLUME and Kafka and also developed the Pig UDF's and Hive UDF's to pre-process the data for analysis.
- Experience with NOSQL databases like HBASE and Cassandra
- Experience in scripting using UNIX Shell script. Proficiency in Linux (UNIX) and Windows OS.
- Experienced in setting up data gathering tools such as Flume and Sqoop.
- Extensive knowledge about Zookeeper process for various types of centralized configurations.
- Knowledge of monitoring and managing Hadoop cluster using Confidential .
- Experienced in working with Flume to load the log data from multiple sources directly into HDFS.
- Worked with application teams to install operating system, Hadoop updates, patches and version upgrades as required.
- Experienced in analyzing, designing and developing ETL strategies and processes, Writing ETL specifications.
- Created yaml file to push the application in Pivotal Cloud Foundry. Deployed Spark application and java web services in pivotal cloud foundry. Have the ability to be a value contribution to the company.
- Experiences on applications using Java, python and UNIX shell scripting.
- Have good interpersonal skills, good communication, problem solving skills and a motivated team player.
TECHNICAL SKILLS:
Hadoop/Big Data: HDFS, MapReduce, HBase, Pig, Hive, Sqoop, Flume, MongoDB, HBase, Oozie, Zookeeper, spark, storm& Kafka
Java & J2EE Technologies: Core Java
IDE’s: Eclipse, Net beans
Big data Analytics: Datameer 2.0.5
Frameworks: MVC, Struts, Hibernate, Spring
Programming languages: C, C++, Java, Python, Ant scripts, Linux shell scripts
Databases: Oracle 11g/10g/9i, MySQL, DB2, MS-SQL Server
Web Servers: Web Logic, Web Sphere, Apache Tomcat
Web Technologies: HTML, XML, JavaScript, AJAX, SOAP, WSDL
Network Protocols: TCP/IP, UDP, HTTP, DNS, DHCP, FTP
ETL Tools: Talend, Informatica, Pentaho, SSRS, SSIS, BO, Crystal reports, Cognos.
T esting: Win Runner, Load Runner, QTP
WORK EXPERIENCE:
Confidential, SFO, CA
AWS/ Hadoop Developer
Responsibilities:
- Implemented Hadoop cluster on Confidential and assisted with performance tuning, monitoring and troubleshooting.
- Installed and configured MapReduce, HIVE and the HDFS.
- Created, altered and deleted topics (Kafka Queues) when required with varying Performance tuning using Partitioning, bucketing of IMPALA tables.
- Involved in file movements between HDFS and AWS S3 and extensively worked with S3 bucket in AWS.
- Created data partitions on large data sets in S3 and DDL on partitioned data.
- Converted all Hadoop jobs to run in EMR by configuring the cluster according to the data size.
- Monitor and Troubleshoot Hadoop jobs using Yarn Resource Manager and EMR job logs using Genie and kibana.
- On the edge device, responsible of sending response in serialized object to the cloud service using REST.
- Created yaml file to push the application in Pivotal Cloud Foundry. Deployed Spark application and java web services in pivotal cloud foundry.
- Implemented rapid-provisioning and life-cycle management for using Amazon EC2, Chef, and custom Ruby/Bash scripts.
- Created RDD's in Spark technology and extracted data from data warehouse on to the Spark RDD’s.
- Configured various property files like core-site.xml, hdfs-site.xml, mapred-site.xml based upon the job requirement.
- Involved in Configuring core-site.xml and mapred-site.xml according to the multi node cluster environment.
- Involved in the development of Pig UDF'S to analyze by pre-processing the data. Experience in Daily production support to monitor and trouble shoots Hadoop/Hive jobs.
- Designed and configured Kafka cluster to accommodate heavy throughput of 1 million messages per second.
- Created HBase tables to store various data formats of PII data coming from different portfolios. Data processing using SPARK. Used AVRO, Parquet file formats for serialization of data.
- Implemented Data Integrity and Data Quality checks in Hadoop using Hive and Linux scripts.
- Developed data pipeline using Flume, Sqoop, Pig and Java Map Reduce to ingest claim data and financial histories into HDFS for analysis.
- Design and engineering of data movement framework NIFI for structured and unstructured data sources.
- Engineering and implementation of security framework for APACHE NIFI with LDAP and SSL layers. Securing and encrypting Nifi UI, Enabling LDAP.
- Involved in setting up of HBase to use HDFS. Involved in creating Hive tables, loading data &writing hive queries. Implemented Partitioning, Dynamic Partitions, Buckets in Hive.
- Written Hive queries for data analysis to meet the business requirements. Created HBase tables to store various data formats of incoming data from different portfolios.
- Used Hive partitioning and bucketing for performance optimization of the Hive tables and created around 20000 partitions. Importing and exporting data into HDFS and Hive using Sqoop.
- Consumed the data from Kafka queue using Spark. Configured different topologies for Spark cluster and deployed them on regular basis.
- Involved in loading data from LINUX file system to HDFS. Importing and exporting data into HDFS and Hive using Sqoop.
- Managing work flow and scheduling for complex map reduce jobs using Apache Oozie.
Environment: Hadoop, Map-Reduce, AWS, EMR, HBase, NIFI, Hive, Impala, Pig, Hive, Sqoop, Hdfs, Flume, Oozie, Spark, Spark SQL, Spark Streaming, Scala, Cloud Foundry, Kafka and Confidential .
Confidential, Houston, TX
Big Data/ Talend Developer
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop
- Worked extensively with Flume for importing social media data.
- Presto for interactive queries against several internal data stores, including their 300PB data warehouse. Created Talend Mappings to populate the data into Staging, Dimension and Fact tables.
- Experience in integrating Denodo with Oracle , SQL Server , MySQL Workbench databases using JDBC.
- Designed and developed high-quality integration solutions by using Denodo virtualization tool (read data from multiple sources including Oracle, Hadoop and MySQL).
- Experience in Installing and Configuring Virtual Data Port (VDP) Database setup in Denodo.
- Worked on project to retrieve log messages procured by leveraging Spark Streaming.
- Designed Oozie jobs for the auto processing of similar data. Collect the data using Spark Streaming .
- Analyzed the data by performing Hive queries and running Pig scripts to know user behavior.
- Extensively used for all and bulk collect to fetch large volumes of data from table.
- Installed Oozie workflow engine to run multiple Hive and Pig jobs.
- Developed Pig scripts in the areas where extensive coding needs to be reduced.
- Worked with Spark Streaming to ingest data into spark engine. Extensively used for all and bulk collect to fetch large volumes of data from table.
- Performed transformations, cleaning and filtering on imported data using Hive , Map Reduce , and loaded final data into HDFS .
- Handled importing of data from various data sources using Sqoop, performed transformations using Hive, MapReduce and loaded data into HDFS.
- Created HBase tables to store various data formats of PII data coming from different portfolios.
- Configured Sqoop and developed scripts to extract data from MySQL into HDFS.
- Hands-on experience with productionalizing Hadoop applications viz. administration, configuration management, monitoring, debugging and performance tuning.
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig HBase database and Sqoop .
- Created HBase tables to store various data formats of PII data coming from different portfolios. Data processing using SPARK.
- Developed Spark scripts by writing custom RDDs in Scala for data transformations and perform actions on RDDs . Parsed high-level design specification to simple ETL coding and mapping standards.
- Cluster co-ordination services through Zookeeper . Created Pig Latin scripts to sort, group, join and filter the enterprise wise data.
- Developed complex Talend jobs mappings to load the data from various sources using different components. Design, develop and implement solutions using Talend Integration Suite.
- Partitioning data streams using KAFKA . Designed and configured Kafka cluster to accommodate heavy throughput of 1 million messages per second. Used Kafka producer 0.8.3 API's to produce messages.
- Built big Data solutions using HBase handling millions of records for the different trends of data and exporting it to Hive .
- Developed scripts in Hive to perform transformations on the data and load to target systems for use by the data analysts for reporting.
- Tested the data coming from the source before processing. Familiarized with automated monitoring tools like Nagios.
- Used Oozie as workflow engine and Falcon for Job scheduling. Debugged the technical issues and errors was resolved.
Environment: Hadoop, Talend ETL Tool, Falcon, HDFS, Denodo, MapReduce, Pig, Hive, Sqoop, HBase, Oozie, Flume, Zookeeper, java, SQL, Scripting, Spark, Kafka.
Confidential, Plano, TX
Big Data/ Hadoop Developer
Responsibilities:
- Continuous monitoring and managing the Hadoop cluster through Confidential Manager
- Upgraded the Hadoop Cluster from CDH 3 to CDH 4, setting up High Availability Cluster and integrating HIVE with existing applications.
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase database and Sqoop.
- Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
- Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and PIG to pre-process the data.
- Provided design recommendations and thought leadership to sponsors/stakeholders that improved review processes and resolved technical problems.
- Managed and reviewed Hadoop log files. Installing and deploying IBM Web-sphere. Installing and deploying IBM Web-sphere.
- Implemented the NoSQL database HBase and the management of the other tools and process observed running on YARN.
- Written Hive Queries to fetch Data from HBase and transferred to HDFS through HIVE.
- Tested raw data and executed performance scripts. Shared responsibility for administration of Hadoop, Hive and Pig. Analyze, validate and document the changed records for IBM web application.
- Experience in deploying applications in heterogeneous Application Servers TOMCAT, WebLogic, IBM WebSphere and Oracle Application Server.
- Responsible for developing map reduce program using text analytics and pattern matching algorithms
- Involved in in porting data from various client servers like Remedy Altiris Cherwell OTRS etc. into HDFS file system.
- Setup and benchmarked Hadoop/HBase clusters for internal use. Assist the development team to install single node Hadoop 224 in local machine.
- Participated in architectural and design decisions with respective teams. Developed in-memory data grid solution across conventional and cloud environments using Oracle Coherence.
- Work with customers to develop and support solutions that use our in-memory data grid product.
- Used Pig as to do transformations, event joins, filters and some pre-aggregations before storing the data onto HDFS.
- Analysis with data visualization player Tableau. Writing Pig scripts for data processing.
- Using Scala developed spark code and Spark-SQL/Streaming for faster processing and testing of data.
- Designed database and created tables, written the complex SQL Queries and stored procedures as per the requirements.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard. Loaded the aggregated data onto DB2 for reporting on the dashboard.
Environment: Big Data/Hadoop, Python, Java, Agile, Spark Streaming, HDFS, Map-Reduce, Hive, Pig, Sqoop, Flume, Zookeeper, Oozie, DB2, NoSQL, HBase, IBM WebSphere, Tomcat and Tableau.
Confidential, NYC
Hadoop Developer
Responsibilities:
- Developed Map-Reduce programs for data analysis and data cleaning.
- Installing and configuring Confidential Data Platform 2.1 - 2.3.
- Implemented Big Data solutions including data acquisition, storage, transformation and analysis.
- Wrote Map-Reduce jobs to discover trends in data usage by users.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Provided quick response to ad hoc internal and external client requests for data.
- Loaded and transformed large sets of structured and unstructured data using Hadoop.
- Developed Pig scripts in the areas where extensive coding needs to be reduced.
- Responsible for creating Hive tables, loading data and writing hive queries.
- Involved in loading data from Linux file system to HDFS.
- Experience in deploying applications in heterogeneous Application Servers WebLogic and IBM WebSphere.
- Analyze, validate and document the changed records for IBM application.
- Excellent knowledge of NOSQL on Mongo and Cassandra DB.
- Handled importing data from various data sources, performed transformations using Hive and Map-Reduce, streamed using Flume and loaded data into HDFS.
- Installed Ozzie workflow engine to run multiple MapReduce, Hive, Impala, Zookeeper and Pig jobs which run independently with time and data availability.
- Developed Simple to complex MapReduce Jobs using Hive and Pig. Imported data using Sqoop to load data from MySQL to HDFS on regular basis. Developing Scripts and Batch Job to schedule various Hadoop Program. Written Hive queries for data analysis to meet the business requirement.
Environment: Confidential, Hadoop, Pig, Hive, Oozie, NoSQL, Sqoop, Flume, HDFS, HBASE, Map-Reduce, MySQL, Horton Works, Impala, Cassandra DB, Mongo, IBM WebSphere, Tomcat, Zookeeper.
Confidential, Peoria, IL
Java Developer
Responsibilities:
- Prepared High Level and Low Level Design document implementing applicable Design Patterns with UML diagrams to depict components, class level details.
- Interacting with the system analysts & business users for design & requirement clarification.
- Developed Web Services using SOAP, SOA, WSDL Spring MVC and developed DTDs, XSD schemas for XML (parsing, processing, and design) to communicate with Active Directory application using Restful API.
- Developed JSPs according to requirement. Wrote AngularJS controllers, views, and services.
- Developed integration services using SOA, Web Services, SOAP, and WSDL.
- Designed, developed and maintained the data layer using the ORM framework in Hibernate.
- Involved in Analysis, Design, Development, and Production of the Application and develop UML diagrams. Developed HTML reports for various modules as per the requirement.
- Used spring framework's JMS support for writing to JMS Queue, Hibernate Dao Support for interfacing with the database and integrated spring with JSF.
- Extensively used Struts framework for MVC, used Struts framework in UI designing and validations.
- Used Struts tiles libraries for layout of web page, and performed struts validations using Struts validation framework.
- Created components to extract application messages stored in xml files. Used Ant for building and the application is deployed on JBOSS application server.
- Used JAVA, J2EE application development skills with Object Oriented Analysis and extensively involved throughout Software Development Life Cycle ( SDLC ).
Environment: Java, JDBC, spring, JSP, JBOSS, Servlets, Maven, Jenkins, Flex, HTML, AngularJS, Hibernate, JavaScript, Eclipse, Struts, SQL Server2000.
Confidential
Java Project
Responsibilities:
- Involved in various phases of Software Development Life Cycle (SDLC) as design development and unit testing.
- Developed and deployed UI layer logics of sites using JSP, XML, JavaScript, HTML/DHTML, and Ajax .
- Agile Scrum Methodology been followed for the development process.
- Developed proto-type test screens in HTML and JavaScript .
- Involved in developing JSP for client data presentation and, data validation on the client side with in the forms.
- Experience in writing PL/SQL stored procedures, Function, Triggers, Oracle reports and Complex SQL’s .
- Worked with JavaScript to perform client side form validations. Gave an innovative for logging for all interdepends application.
- Used Struts tag libraries as well as Struts tile framework.
- Used JDBC to access Database with Oracle thin driver of Type-3 for application optimization and efficiency. Created connection through JDBC and used JDBC statements to call stored procedures .
- Client side validation done using JavaScript .
- Used Data Access Object to make application more flexible to future and legacy databases.
- Actively involved in tuning SQL queries for better performance.
- Developed the application by using the Spring MVC framework .
- Collection framework used to transfer objects between the different layers of the application.
- Developed data mapping to create a communication bridge between various application interfaces using XML , and XSL .
- Proficient in developing applications having exposure to Java, JSP, UML, Oracle (SQL, PL/SQL), HTML, Junit, JavaScript, Servlets, Swing DB2, CSS .
- Actively involved in code review and bug fixing for improving the performance.
- Documented application for its functionality and its enhanced features.
- Successfully delivered all product deliverables that resulted with zero defects.
Environment: Spring MVC, Oracle (SQL, PL/SQL), J2EE, Java, struts, JDBC, Servlets, JSP, XML, Design Patterns, CSS, HTML, JavaScript 1.2, Junit, Apache Tomcat, My SQL Server 2008