Sr. Big Data Developer Resume
Boston, MA
SUMMARY
- Over 10 years of IT experience as Big Data/Hadoop Developer in all phases of Software Development Life Cycle including Java/J2EE technologies.
- Hands on experience in installing, configuring, and using Hadoop ecosystem components like Hadoop Map Reduce, HDFS, HBase, Hive, Sqoop, Pig, Zookeeper, Storm, Spark, Kafka and Flume.
- Well versed experience in Amazon Web Services (AWS) Cloud services like EC2, S3, EBS, RDS and VPC.
- Improved the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark - SQL, Data Frame, Pair RDD's, Spark YARN.
- Proficient in Core Java, Enterprise technologies such as EJB, Hibernate, Java Web Service, SOAP, REST Services, Java Thread, Java Socket, Java Servlet, JSP, JDBC etc.
- Good exposure to Service Oriented Architectures (SOA) built on Web services (WSDL) using SOAP protocol.
- Written multiple MapReduce programs in Python for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats.
- Experience in working on the Hadoop Eco system, also have little experience on installing and configuring of the Hortonworks distribution and Cloudera distribution (CDH3 and CDH4).
- Experience in NoSQL database HBase, MongoDB and Cassandra.
- Good understanding of Hadoop architecture and hands on experience with Hadoop components such as Job Tracker, Task Tracker, Name Node, Data Node and MapReduce programming.
- Expertise in Data Migration, Data Profiling, Data Cleansing, Transformation, Integration, Data Import, and Data Export through the use of multiple ETL tools such as Informatica Power Centre.
- Experience in designing, building and implementing complete Hadoop ecosystem comprising of Map Reduce, HDFS, Hive, Impala, Pig, Sqoop, Oozie, HBase, MongoDB, and Spark.
- Experience with Client-Server application development using Oracle PL/SQL, SQL PLUS, SQL Developer, TOAD, and SQL LOADER.
- Strong experience with architecting highly per formant databases using PostgreSQL, PostGIS, MySQL and Cassandra.
- Extensive experience in using ER modeling tools such as Erwin and ER/Studio, Teradata, BTEQ, MLDM and MDM.
- Experienced on R and Python for statistical computing. Also experience with MLlib (Spark), Matlab, Excel, Minitab, SPSS, and SAS
- Experience in importing and exporting data between HDFS and RDBMS using Sqoop.
- Extracted & processed streaming log data from various sources and integrated in to HDFS using Flume.
TECHNICAL SKILLS
- Big data/Hadoop: Hadoop 2.7/2.5, HDFS1.2.4, Map Reduce, Hive, Pig, Sqoop, Oozie, Hue, Flume, Kafka and Spark 2.0/2.0.2
- NoSQL Databases: HBase, MongoDB3.2 & Cassandra
- Java/J2EE Technologies: Servlets, JSP, JDBC, JSTL, EJB, JAXB, JAXP, JMS, JAX - RPC, JAX- WS
- Programming Languages: Java, Python, SQL, PL/SQL, AWS, HiveQL, Unix Shell Scripting, Scala
- IDE and Tools: Eclipse 4.6, Netbeans 8.2, BlueJ
- Database: Oracle 12c/11g, MYSQL, SQL Server 2016/2014
- Web Technologies: HTML5/4, DHTML, AJAX, JavaScript, jQuery and CSS3/2, JSP, Bootstrap 3/3.5
- Application Server: Apache Tomcat, Jboss, IBM Web sphere, Web Logic
- Operating Systems: Windows8/7, UNIX/Linux and Mac OS.
- Other Tools: Maven, ANT, WSDL, SOAP, REST .
- Methodologies: Software Development Lifecycle (SDLC), Waterfall, Agile, STLC (Software Testing Life cycle), UML, Design Patterns (Core Java and J2EE)
PROFESSIONAL EXPERIENCE
Confidential - Boston, MA
Sr. Big Data Developer / Datawarehouse Developer
Responsibilities:
- Worked as a Sr. Big Data Developer with Hadoop Ecosystems components.
- Developed Big Data solutions focused on pattern matching and predictive modeling.
- Involved in Agile methodologies, daily scrum meetings, spring planning.
- Primarily involved in Data Migration process using Azure by integrating with GitHub repository and Jenkins.
- Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
- Involved in various phases of development analyzed and developed the system going through Agile Scrum methodology.
- Worked on MongoDB by using CRUD (Create, Read, Update and Delete), Indexing, Replication and Sharding features.
- Involved in designing the row key in HBase to store Text and JSON as key values in HBase table and designed row key in such a way to get/scan it in a sorted order.
- Strong working experience in the data analysis, SQL design and development, implementation and testing of data warehousing using extraction, transformation and loading (ETL) Tools and Teradata.
- Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
- Used Java Persistence API (JPA) framework for object relational mapping which is based on POJO Classes.
- Responsible for fetching real time data using Kafka and processing using Spark and Scala.
- Worked on Kafka to import real time weblogs and ingested the data to Spark Streaming.
- Developed business logic using Kafka Direct Stream in Spark Streaming and implemented business transformations.
Environment: Agile, Hadoop 3.0, MS Azure, MapReduce, Java, MongoDB 4.0.2, HBase 1.2, JSON, Scala 2.12, Oozie 4.3, Zookeeper 3.4, J2EE, Python 3.7, JQuery, NoSQL, MVC, Struts 2.5.17, Hive 2.3
Confidential - Dallas, TX
Sr. Hadoop Developer
Responsibilities:
- Worked on Spark SQL to handle structured data in Hive.
- Involved in making Hive tables, stacking information, composing hive inquiries, producing segments and basins for enhancement.
- Involved in migrating tables from RDBMS into Hive tables using SQOOP and later generate visualizations using Tableau.
- Worked on complex MapReduce program to analyses data that exists on the cluster.
- Analyzed substantial data sets by running Hive queries and Pig scripts.
- Written Hive UDFs to sort Structure fields and return complex data type.
- Worked in AWS environment for development and deployment of custom Hadoop applications.
- Involved in creating Shell scripts to simplify the execution of all other scripts (Pig, Hive, Sqoop, Impala and MapReduce) and move the data inside and outside of HDFS.
Environment: HDFS, MapReduce, Storm, Hive, Pig, Sqoop, MongoDB, Apache Spark, Python, Accumulo, Oozie Scheduler, Kerberos, AWS, Tableau, Java, UNIX Shell scripts, HUE, SOLR, GIT, Maven.
Confidential - Franklin Lakes, NJ
Sr. Big Data Engineer/Datawarehouse Developer
Responsibilities:
- As a Sr. Big Data Engineer worked on Big Data technologies like Apache Hadoop, MapReduce, Shell Scripting, Hive.
- Involved in all phases of SDLC using Agile and participated in daily scrum meetings with cross teams
- Wrote complex Hive queries to extract data from heterogeneous sources (Data Lake) and persist the data into HDFS.
- Involved in all phases of data mining, data collection, data cleaning, developing models, validation and visualization.
- Developed the code to perform Data extractions from Oracle Database and load it into AWS platform using AWS Data Pipeline.
- Installed and configured Hadoop ecosystem like HBase, Flume, Pig and Sqoop.
- Designed and develop Big Data analytic solutions on a Hadoop-based platform and engage clients in technical discussions.
- Created SSIS package to extract, validate and load data into Data warehouse.
- Worked on Hive Table creation and Partitioning
- Installed, Configured and Maintained the Hadoop cluster for application development and Hadoop ecosystem components like Hive, Pig, HBase, Zookeeper and Sqoop.
Environment: Hive 2.3, MapReduce, Hadoop 3.0, HDFS, Oracle, Spark 2.3, HBase 1.2, Flume 1.8, Pig 0.17, Sqoop 1.4, Oozie 4.3, Python, PL/SQL, NoSQL, SSIS, SSRS, Visio, AWS Redshift, Teradata, Python, SQL, PostgreSQL, EC2, S3, Windows, Pl/Sql
Confidential - Florham Park, NJ
Jr. Data Analyst/Data Modeler
Responsibilities:
- Worked with Business Analysts team in requirements gathering and in preparing functional specifications and translating them to technical specifications.
- Worked with Business users during requirements gathering and prepared Conceptual, Logical and Physical Data Models.
- Worked with supporting business analysis and marketing campaign analytics with data mining, data processing, and investigation to answer complex business questions.
- Developed scripts that automated DDL and DML statements used in creations of databases, tables, constraints, and updates.
- Planned and defined system requirements to Use Case, Use Case Scenario and Use Case Narrative using the UML (Unified Modeling Language) methodologies.
- Gather all the analysis reports prototypes from the business analysts belonging to different Business units; Participated in JAD sessions involving the discussion of various reporting needs.
Environment: PL/SQL, Erwin 8.5, MS SQL 2012, OLTP, ODS, OLAP, SSIS, Transact-SQL, Teradata SQL Assistant