Hadoop/spark Developer Resume
Phoenix, AZ
SUMMARY:
- Having 8+ years of professional IT experience in Analysis, Development, Integration and Maintenance of Web based and Client/Server applications using Java and Big Data technologies.
- 4 years of relevant experience in Hadoop Ecosystem and architecture (HDFS, Spark, MapReduce, YARN, Pig, Hive, HBase, Sqoop, Flume, Oozie).
- Experience in all phases of software development life cycle (SDLC), which includes User Interaction, Business Analysis/Modelling, Design/Architecture, Development, Implementation, Integration, Documentation, Testing, and Deployment
- Hands on experience in installing, configuring and using Apache Hadoop ecosystem components like Hadoop Distributed File System (HDFS), MapReduce, PIG, HIVE, HBASE, ZOOKEEPER, SQOOP, HUE, JSON.
- Reading data from File system into a Spark RDD
- Good understanding in processing of real - time data using Spark.
- Inject data using Sqoop from various RDBMS like Oracle, MYSQL, and Microsoft SQL Server into Hadoop HDFS.
- Integration of OBIEE, ODI, Tableau with Hive.
- Experienced in WAMP (Windows, Apache, MYSQL, and Python /PHP) and LAMP (Linux, Apache, MySQL, and Python /PHP) Architecture.
- Good experience in developing web applications implementing Model View Control architecture using Django, Flask, Pyramid and Zope Python web application frameworks.
- Experience in implementation of Open-Source frameworks like Spring, Hibernate, Web Services etc.,
- Experience in Continuous Integration and Continuous Deployment by the tools like Jenkins
- Experience in manipulating the streaming data to clusters through Kafka and Spark-Streaming.
- Experience with databases such as Oracle 9i, PostgreSQL, MySQL Server with cluster setup and writing the SQL queries Triggers & Stored Procedures
- Very Good understanding and Working Knowledge of Object Oriented Programming(OOPS), Python and Scala.
- Experienced with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Proficient in working with NoSQL database like MongoDB, Cassandra and HBase.
- Good Knowledge in NoSQL databases HBASE (Column family DB).
- Good knowledge on Hadoop MRV1 and Hadoop MRV2 (or) YARN Architecture.
- Communicated to diverse communities of clients at offshore and onshore, dedicated to client satisfaction and quality outcomes. Extensive experience in coordinating the Offshore Development activities
- Highly organized and dedicated with positive Attitude, possess good time management and organizational skills with the ability to handle multiple tasks with positive attitude.
- Experience working across multiple industries with Fortune 500 customers and government agencies.
TECHNICAL SKILLS:
BigData components: Hadoop/Big Data HDFS, MapReduce,HBase, Pig, Cassandra, chukwa Hive, Scala, Sqoop, Oozie, Kettle, Kafka, Zookeeper, MongoD
Programming Languages: Java (J2SE, J2EE), C, C#, PL/SQL, Swift, SQL+, ASP.NET, JDBC, Python
Mobile Development: Android, IOS application development with Swift, Objective C
Web Development: JavaScript, JQuery, HTML 5.0, CSS 3.0, AJAX, JSON
Development Tools: NetBeans 8.0.2, Visual Studio 2013, Eclipse Neon, Android Studio, SQL developer
Testing Tools: J-Unit Testing, HP- Unified functional testing, HP- Performance Center, Selenium, win runner, Load Runner, QTP
UNIX Tools: Apache, Yum, RPM
Operating Systems: Windows, Linux, Ubuntu, Mac OS, Red Hat Linux
Protocols: TCP/IP, HTTP and HTTPS
Web Servers: Apache Tomcat
Cluster Management Tools: Cloudera Manager, HortonWorks, Ambari
Methodologies: Agile, V-model, Waterfall model
Databases: HBase, MongoDB, Cassandra, Oracle 10g, MySQL, Couch, MS SQL server
Encryption Tools: VeraCrypt, AxCrypt, BitLocker, GNU Privacy Guard
PROFESSIONAL EXPERIENCE
Hadoop/Spark Developer
Confidential, Phoenix, AZ
Responsibilities:
- Expert in implementing advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala
- Creating end to end Spark-Solr applications using Scala to perform various data cleansing, validation, transformation and summarization activities according to the requirement
- Used flume, sqoop, hadoop, spark and oozie for building data pipeline.
- Cluster coordination services through Zookeeper.
- Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
- Automated all the jobs, for pulling data from FTP server to load data into Hive tables, using Oozie workflows.
- Implemented Spark using Scala and SparkSQL for faster testing and processing of data.
- Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs.
- Developed Oozie workflow for scheduling and orchestrating the ETL process. Designed & Implemented Java MapReduce programs to support distributed data processing.
- Worked with highly unstructured and semi-structured data of 30 TB in size (90 TB with replication factor of 3).
- Contributed towards developing a Data Pipeline to load data from different sources like Web, RDBMS, NoSQL to Apache Kafka or Spark cluster.
- Migrating data from Spark-RDD into HDFS and NoSQL like Cassandra/Hbase.
- Build and Release Management Built application logic using Python2.7.
- Worked on reading multiple data formats on HDFS using PySpark
- Very good Knowledge In python Script
- Developed Kafka producer and consumers, HBase clients, Spark and Hadoop MapReduce jobs along with components on HDFS, Hive.
- Worked on the core and Spark SQL modules of Spark extensively.
- Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance
Environment: Hadoop, HDFS, Hive, Scala, Spark, SQL, Teradata, UNIX Shell Scripting, Big Data, Map Reduce, Sqoop, Oozie, Pig, Flume, LINUX, Java, Eclipse
Hadoop/Scala Developer
Confidential - Malvern, PA
Responsibilities:
- Worked in Multi Clustered Hadoop Echo-System environment
- Created MapReduce programs using Java API that filter un-necessary records and find out unique records based on different criteria.
- Designed and developed the UI of the website using HTML5, XHTML, AJAX, CSS3, BIG DATA and JavaScript.
- Load and transform large sets of unstructured data from UNIX system to HDFS
- Use Apache Scoop to dump the data user data into the HDFS on a weekly basis.
- Created production jobs using Oozie work flows that integrated different actions like Map Reduce, Sqoop, Hive.
- Creating end to end Spark-Solr applications using Scala to perform various data cleansing, validation, transformation and summarization activities according to the requirement.
- Good knowledge on Spark Ecosystem and Spark Architecture.
- Used Scala collection framework to store and process the complex employer information. Based on the offers setup for each client, the requests were post processed and given offers.
- Created tables, loading with data and writing HIVE queries which will run internally in map
- Developed SOAP web service as publisher/producer.
- Used Akka as a framework to create reactive, distributed, parallel and resilient concurrent applications in Scala.
- Involved in developing a linear regression model to predict a continuous measurement for improving the observation on wind turbine data developed using spark with ScalaAPI.
- Good knowledge in writing Spark application using Python and Scala.
- Developed Spark scripts by using Scalashell commands as per the requirement.
- Developed different GUI screens JSPs using HTML, JavaScript and CSS.
- Designed the user interface of the application using Angular JS, Bootstrap, HTML5, CSS3 and JavaScript.
Environment: Hadoop MapReduce, Scala, HIVE, HDFS, Java (JDK1.6), JMS, Spring (IoC, AOP), CSV files, Python, Django, Java, Angular.js, Bootstrap, AWS, XML, Python, MySQL, HTML, XHTML, CSS, AJAX, JavaScript, Jenkins, Apache Web Server, Linux.
Hadoop Developer
Confidential - Washington, DC
Responsibilities:
- Worked on Large-scale Hadoop cluster for distributed data processing and analysis using Sqoop, Hive, Pig and MapReduce.
- Translated data from multiple sources into useful information and business drivers utilized by senior management for strategic decision making.
- Created Hive tables and charts using worksheet dataand external resources, modified Hive tables, sorted items and group data, and refreshed and formatted Hive tables.
- Extracted, compiled and tracked dataand analysed datato generate reports.
- Synthesizing large amounts of diffuse datafrom both public and internal databases, maintaining major prediction.
- Conduct datamining, data modelling, statistical analysis, business intelligence gathering, trending and benchmarking.
- Wrote the Shell scripts to run the Cron Jobs to automate the data migration process from external servers and FTP sites.
- Used Tableau for visualization and generate reports for financial data consolidation, reconciliation and segmentation.
- Designed and developed script for transfer of files using FTP/SFTP between servers according to business requirements
- Implemented machine learning techniques like clustering and regression on Tableau and created interactive dashboards
- Used Unit Test Pythonlibrary for testing many Python programs and block of codes.
- Parse JSON and XML data using Python.
- Developed entire frontend and backend modules using Pythonon Django Web Framework.
- Developed tools using Python, Shell scripting, XML, BIG DATA to automate some of the menial tasks.
- Managed and reviewed Hadoop log files.
- Support full testing cycle for ETL processes, including bug fixes.
- Performed upgrades, package administration and support for over 200 Linuxservers.
- Performed automated installation of CentOS operating system using kickstart.
- Monitored physical and virtual servers remotely using Nagios monitoring tool.
- Install, configure and maintain Red Hat Linux5.x/6.x and Centos servers using kickstart and interactiveinstallations procedures.
- Install and configured Anti-Virus agent on configured Vm's
Environment: HDFS, Hive, Pig, Sqoop, ZooKeeper, Oozie, ETL, Tableau, Hive
QueryJava Developer
Confidential
Responsibilities:
- Participated in re-design of the application using Java, JSP, Servlets, Java Beans, XML, AdvantNet SNMP and MySQL technologies.
- Wrote PL/SQL queries, stored procedures, and triggers to perform back-end database operations.
- Experience in using multiple Action Controllers to control the page flow.
- Worked in UI team to develop new customer facing portal for Long Term Care Partners.
- Implementing Java API using core java
- Write new features in Golang
- Used JDBC to establish connection between the database and the application.
- Used AJAX for client-to-server communication
- Created the user interface using HTML, CSS and JavaScript.
- Developed the code which will create XML files and Flat files with the data retrieved from Databases and XML files.
- Applied design patterns and OO design conceptsto improve the existing Java/J2EE based code base.
- Developed JAX-WS web services
- Written Implementation proposals with design alternatives for ENUM+ and IPWorks 5.0 upgrade work packages and configured MySQL Cluster with 4 Solaris Systems and Integrated with IPWorks.
- Designed and developed ENUM+ objects storage in MySQL cluster synchronizing with DNS Server using java multi-threading concepts
- Built SPA with loading multiple views using route services using Angular2 and NodeJs
- Created Angular2 components, implemented Interpolation, Input variables, Bootstrapping, NgFor, NgIf, Router Outlet, binding the events, decorators
- Migrate the legacy system implemented in Perl to Golang
- Used JavaScript, AJAX, HTML for front end.
- Used SQL to write complex queries.
- For my outstanding work I was awarded with "Feather in My Cap” award.
Environment: J2EE 5, Struts 2.0, Hibernate 3.0, MVC, WebLogic Application Server 10.3, UML, JSP, Servlets, Java Script, HTML5, CSS, Ajax, Angular2, Web Services, JBOSS,Oracle 10g, Eclipse 3.5 IDE, PL/SQL, ANT, Junit, XML/XSL, log 4j 1.2.15.
PL/SQL Developer
Confidential
Responsibilities:
- Wrote Stored Procedures in PL/SQL.
- Defragmentation of tables, partitioning, compressing and indexes for improved performance and efficiency.
- Involved in table redesigning with implementation of Partition Table and Partition Indexes to makeDatabaseFaster and easier to maintain.
- UsedSQL Server SSIS toolto build high performance data integration solutions includingextraction, transformationandload packagesfordata warehousing.
- Extracted data from theXMLfile and loaded it into thedatabase.
- Created and modifiedSQL*Plus, PL/SQLandSQL*Loader scriptsfor data conversions.
- Worked onXMLalong with PL/SQL to develop and modify web forms.
- Designed Data Modeling, Design Specifications and to analyzeDependencies.
- Creatingindexeson tables to improve the performance by eliminating the full table scans and views for hiding the actual tables and to eliminate the complexity of the large queries.
- Involved in creatingUNIX Shell Scripting.
- Maintaining Logical and Physical structure of the database.
- Creating tablespaces, tables, views,scripts for automatic operationsof the database activities.
- Coded variousstored procedures, packagesandtriggersto incorporate business logic into the application.
Environment: Oracle 9i, 10g, PL/SQL, Erwin 4.1, C, C++, Oracle Designer 2000,Windows 2000, Toad, SQL*Plus.
