Hadoop/spark Developer Resume
Phoenix, AZ
SUMMARY
- Having 8+ years of professional IT experience in Analysis, Development, Integration and Maintenance of Web based and Client/Server applications using Java and Big Data technologies.
- 4 years of relevant experience in Hadoop Ecosystem and architecture (HDFS, Spark, MapReduce, YARN, Pig, Hive, HBase, Sqoop, Flume, Oozie).
- Experience in all phases of software development life cycle (SDLC), which includes User Interaction, Business Analysis/Modelling, Design/Architecture, Development, Implementation, Integration, Documentation, Testing, and Deployment
- Hands on experience in installing, configuring and using Apache Hadoop ecosystem components like Hadoop Distributed File System (HDFS), MapReduce, PIG, HIVE, HBASE, ZOOKEEPER, SQOOP, HUE, JSON.
- Reading data from File system into a Spark RDD
- Good understanding in processing of real - time data using Spark.
- Inject data using Sqoop from various RDBMS like Oracle, MYSQL, and Microsoft SQL Server into Hadoop HDFS.
- Integration of OBIEE,ODI, Tableau with Hive.
- Experienced in WAMP (Windows, Apache, MYSQL, andPython /PHP) and LAMP (Linux, Apache, MySQL, andPython /PHP) Architecture.
- Good experience in developing web applications implementing Model View Control architecture using Django, Flask, Pyramid and Zope Python web application frameworks.
- Experience in implementation of Open-Source frameworks like Spring, Hibernate, Web Services etc.
- Experience in Continuous Integration and Continuous Deployment by the tools like Jenkins
- Experience in manipulating the streaming data to clusters through Kafka and Spark-Streaming.
- Experience with databases such as Oracle 9i, PostgreSQL, MySQL Server with cluster setup and writing the SQL queries Triggers & Stored Procedures
- Very Good understanding and Working Knowledge of Object Oriented Programming(OOPS), Python and Scala.
- Experienced with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Proficient in working with NoSQL database like MongoDB, Cassandra and HBase.
- Good Knowledge in NoSQL databases HBASE (Column family DB).
- Good knowledge on Hadoop MRV1 and Hadoop MRV2 (or) YARN Architecture.
- Communicated to diverse communities of clients at offshore and onshore, dedicated to client satisfaction and quality outcomes. Extensive experience in coordinating the Offshore Development activities
- Highly organized and dedicated with positive Attitude, possess good time management and organizational skills with the ability to handle multiple tasks with positive attitude.
- Experience working across multiple industries with Fortune 500 customers and government agencies.
TECHNICAL SKILLS
BigData components: Hadoop/Big Data HDFS, MapReduce,HBase,Pig,Cassandra,Hive, Scala, Sqoop,Oozie, Kettle,Kafka,Zookeeper,MongoD
Programming Languages: Java (J2SE, J2EE), C, C#, PL/SQL, Swift, SQL+, ASP.NET, JDBC, Python
Mobile Development: Android, IOS application development with Swift, Objective C
Web Development: JavaScript, JQuery, HTML 5.0, CSS 3.0, AJAX, JSON
Development Tools: NetBeans 8.0.2, Visual Studio 2013, Eclipse Neon, Android Studio, SQL developer, AWS(Import/Export)
Testing Tools: J-Unit Testing, HP- Unified functional testing, HP- Performance Center, Selenium, win runner, Load Runner, QTP
UNIX Tools: Apache, Yum, RPM
Operating Systems: Windows, Linux, Ubuntu, Mac OS, Red Hat Linux
Protocols: TCP/IP, HTTP and HTTPS
Web Servers: Apache Tomcat
Cluster Management Tools: Cloudera Manager, HortonWorks, Ambari
Methodologies: Agile, V-model, Waterfall model
Databases: HBase, MongoDB, Cassandra,Oracle 10g, MySQL, Couch, MS SQL server
Encryption Tools: VeraCrypt, AxCrypt, BitLocker, GNU Privacy Guard
PROFESSIONAL EXPERIENCE
Hadoop/Spark Developer
Confidential, Phoenix, AZ
Responsibilities:
- Expert in implementing advanced procedures like text analytics and processing using the in-memory computing capabilities like ApacheSpark written in Scala
- Creating end to end Spark-Solr applications using Scala to perform various data cleansing, validation, transformation and summarization activities according to the requirement
- Used flume, sqoop, hadoop, spark and oozie for building data pipeline.
- Good knowledge on Spark Ecosystem and Spark Architecture.
- Cluster coordination services through Zookeeper.
- Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
- Automated all the jobs, for pulling data from FTP server to load data into Hive tables,using Oozieworkflows.
- Implemented Spark using Scala and SparkSQL for faster testing and processing of data.
- Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs.
- Developed Oozieworkflow for scheduling and orchestrating the ETL process. Designed & Implemented Java MapReduce programs to support distributed data processing.
- Worked with highly unstructured and semi-structured data of 30 TB in size (90 TB with replication factor of 3).
- Contributed towards developing a Data Pipeline to load data from different sources like Web, RDBMS, NoSQL to Apache Kafka or Spark cluster.
- Migrating data fromSpark-RDD into HDFS and NoSQL like Cassandra/Hbase.
- Worked on reading multiple data formats on HDFS using PySpark
- Hands on experience in installation, configuration, supporting and managingHadoop ClustersusingApache, Cloudera (CDH3, CDH4), Yarn distributions.
- Developed Kafka producer and consumers, HBase clients,Sparkand Hadoop MapReduce jobs along with components on HDFS, Hive.
- Worked on the core andSpark SQL modules ofSpark extensively.
- Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance
Environment: Hadoop, HDFS, Hive, Scala, Spark, SQL, Teradata, UNIX Shell Scripting, Big Data, Map Reduce, Sqoop, Oozie, Pig, Zookeeper, Flume, LINUX, Java, Eclipse, Python 2.7, Cloudera
Hadoop/Scala Developer
Confidential - Malvern, PA
Responsibilities:
- Create, validate and maintain scripts to load data using Sqoop manually.
- Create Oozie workflows and coordinators to automate Sqoop jobs weekly and monthly.
- Worked on reading multiple data formats on HDFS using Scala.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
- Developed multiple POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
- Analyzed the SQL scripts and designed the solution to implement using Scala
- Develop, validate and maintain HiveQL queries.
- Fetch data to/from HBase using Map Reduce jobs.
- Analyzing data with Hive, Pig.
- Designed Hive tables to load data to and from external tables.
- Writing DistCp shell scripts to load data across servers.
- Run executive reports using Hive and Qlik View.
- Load and transform large sets of unstructured data from UNIX system to HDFS
- Use Apache Scoop to dump the data user data into the HDFS on a weekly basis.
- Created production jobs using Ooziework flows that integrated different actions like Map Reduce, Sqoop, Hive.
- Used Scala collection framework to store and process the complex employer information. Based on the offers setup for each client, the requests were post processed and given offers.
- Used Akka as a framework to create reactive, distributed, parallel and resilient concurrent applications inScala.
- Successfully migrated Django database from SQLite to MySQL with complete data integrity.
- Involved in developing a linear regression model to predict a continuous measurement for improving the observation on wind turbine data developed using spark withScalaAPI.
- Good knowledge in writing Spark application using Python andScala.
- Developed Spark scripts by usingScalashell commands as per the requirement.
Environment: Hadoop Horton Works, Hadoop Stack (Hive, PIG, HCatlog, Sqoop, Oozie), Qlik view, Windows 8, SQL Server 2010, Bit Bucket, Scala, Python Django, Unix
Hadoop Developer
Confidential - Washington, DC
Responsibilities:
- Create, validate and maintain scripts to load data from and into tables in Oracle PL/SQL and in SQL Server 2008 R2.
- Wrote Store Procedures and Triggers.
- Converting, testing and validating Oracle scripts to SQL Server.
- Developed Kafka producer and consumers, HBase clients, Spark and Hadoop MapReduce jobs along with components on HDFS, Hive.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS.
- Used SOLR for database integration IBM MAXIMO to SQL SERVER.
- Upgraded IBM Maximo database from 5.2 to 7.5.
- CreatedAWSS3 buckets, performed folder management in each bucket, Managed cloud trail logs and objects within each bucket.
- Analyze, validate and document the changed records for IBM Maximo web application.
- Importing data from MySQL database to HiveQL using Scoop.
- Implemented OASISBI.
- Writing Map Reduce jobs.
- Develop, validate and maintain HiveQL queries.
- Running reports in Pig and Hive Queries.
- Wrote and Implemented Apache PIG scripts to load data from and to store data into Hive.
- Install and configure Hue.
- Managing Amazon Web ServicesAWSinfrastructure with automation and configuration management tools such as IBM Udeploy, Puppet or custom-built designing cloud-hosted solutions, specific AWSproduct suite experience.
- Junit for unit testing.
- Conduct datamining,datamodelling, statistical analysis, business intelligence gathering, trending and benchmarking by using Datameer.
- Used Tableau for visualization and generate reports for financial data consolidation, reconciliation andsegmentation.
- Designed and developed script for transfer of files using FTP/SFTP between servers according to business requirements
- Implemented machine learning techniques like clustering and regression on Tableau and created interactive dashboards
- Managed and reviewedHadoop log files.
- Support full testing cycle forETLprocesses, including bug fixes.
- Performed upgrades, package administration and support for over 200Linuxservers.
- Performed automated installation of CentOS operating system using kickstart.
Environment: HDFS, Hive, Pig, Sqoop,ZooKeeper, Oozie, ETL, AWS, Tableau, Hive Query, CentOS
Java Developer
Confidential
Responsibilities:
- Participated in re-design of the application using Java, JSP, Servlets,Java Beans, XML, AdvantNet SNMP and MySQL technologies.
- Wrote PL/SQL queries, stored procedures, and triggers to perform back-end database operations.
- Experience in using multiple Action Controllers to control the page flow.
- Worked in UI team to develop new customer facing portal for Long Term Care Partners.
- Implementing Java API using core java
- Write new features in Golang
- Used JDBC to establish connection between the database and the application.
- Used AJAX for client-to-server communication
- Created the user interface using HTML, CSS and JavaScript.
- Developed the code which will create XML files and Flat files with the data retrieved from Databases and XML files.
- Applied design patterns and OO design conceptsto improve the existing Java/J2EE based code base.
- Developed JAX-WS web services
- Expertise in script programming, including BASH shell, JavaScript and Python
- Written Implementation proposals with design alternatives for ENUM+ and IPWorks 5.0 upgrade work packages and configured MySQL Cluster with 4 Solaris Systems and Integrated with IPWorks.
- Designed and developed ENUM+ objects storage in MySQL cluster synchronizing with DNS Server using java multi-threading concepts
- Built SPA with loading multiple views using route services usingAngular2and NodeJs
- Created Angular2 components, implemented Interpolation, Input variables, Bootstrapping, NgFor, NgIf, Router Outlet, binding the events, decorators
- Migrate the legacy system implemented in Perl to Golang
- Used JavaScript, AJAX, HTML for front end.
- Used SQL to write complex queries.
Environment: J2EE 5, Struts 2.0, Hibernate 3.0, MVC, WebLogic Application Server 10.3, UML, JSP, Servlets, Java Script, HTML5, CSS, Ajax, Angular2, Web Services, JBOSS,Eclipse 3.5 IDE, PL/SQL, ANT, Junit, XML/XSL, log 4j 1.2.15.
PL/SQL Developer
Confidential
Responsibilities:
- Wrote Stored Procedures in PL/SQL.
- Defragmentation of tables, partitioning, compressing and indexes for improved performance and efficiency.
- Involved in table redesigning with implementation of Partition Table and Partition Indexes to makeDatabaseFaster and easier to maintain.
- UsedSQL Server SSIS toolto build high performance data integration solutions includingextraction, transformationandload packagesfordata warehousing.
- Extracted data from theXMLfile and loaded it into thedatabase.
- Created and modifiedSQL*Plus, PL/SQLandSQL*Loader scriptsfor data conversions.
- Worked onXMLalong with PL/SQL to develop and modify web forms.
- Designed Data Modeling, Design Specifications and to analyzeDependencies.
- Creatingindexeson tables to improve the performance by eliminating the full table scans and views for hiding the actual tables and to eliminate the complexity of the large queries.
- Involved in creatingUNIX Shell Scripting.
- Maintaining Logical and Physical structure of the database.
- Creating tablespaces, tables, views,scripts for automatic operationsof the database activities.
- Coded variousstored procedures, packagesandtriggersto incorporate business logic into the application.
Environment: Oracle 9i, 10g, PL/SQL, Erwin 4.1, C, C++, Oracle Designer 2000,Windows 2000, Toad, SQL*Plus.
