Sr. Big Data/spark Developer Resume
OhiO
PROFESSIONAL SUMMARY:
- Big Data developer with over Around 9 years of professional IT experience, which includes 4years’ experience in the field of Big Data.
- Extensive experience in working with various distributions of Hadoop like enterprise versions of Cloudera, Hortonworks and good knowledge on MAPR distribution and Amazon’s EMR.
- In depth experience in using various Hadoop Ecosystem tools like HDFS, MapReduce, Yarn, Pig, Hive, Sqoop, Spark, Storm, Kafka, Oozie, Elastic search, HBase, and Zookeeper.
- Extensive knowledge of Hadoop architecture and its components.
- Good knowledge in installing, configuring, monitoring and troubleshooting Hadoop cluster and its eco - system components.
- Exposure to Data Lake Implementation using Apache Spark.
- Developed Data pipe lines and applied business logics using Spark.
- Well-versed in spark components like Spark SQL, MLib, Spark streaming and GraphX.
- Extensively worked on Spark streaming and Apache Kafka to fetch live stream data.
- Used Scala and Python to convert Hive/SQL queries into RDD transformations in Apache Spark.
- Experience in integrating Hive queries into Spark environment using Spark SQL.
- Expertise in performing real time analytics on big data using HBase and Cassandra.
- Handled importing data from RDBMS into HDFS using Sqoop and vice-versa.
- Extensive experience in importing and exporting streaming data into HDFS using stream processing platforms like Flume and Kafka.
- Experience in developing data pipeline using Pig, Sqoop, and Flume to extract the data from weblogs and store in HDFS.
- Created User Defined Functions (UDFs), User Defined Aggregated Functions (UDAFs) in PIG and Hive.
- Hands-on experience in tools like Oozie and Airflowto orchestrate jobs.
- Proficient in NoSQL databases including HBase, Cassandra, MongoDB and its integration with Hadoop cluster.
- Expertise in Cluster management and configuring Cassandra Database.
- Great familiarity with creating Hive tables, Hive joins & HQL for querying the databases eventually leading to complex Hive UDFs.
- Accomplished developing Pig Latin Scripts and using Hive Query Language for data analytics.
- Worked on different compression codecs (ZIO, SNAPPY, GZIP) and file formats (ORC, AVRO, TEXTFILE, PARQUET)
- Experience in practical implementation of cloud-specific AWS technologies including IAM, Amazon Cloud Services like Elastic Compute Cloud (EC2), ElastiCache, Simple Storage Services (S3), Cloud Formation, Virtual Private Cloud (VPC), Route 53, Lambda, EBS.
- Built AWS secured solutions by creating VPC with public and private subnets.
- Worked on data warehousing and ETL tools like Informatica, Talend, and Pentaho.
- Expertise working in JAVA J2EE, JDBC, ODBC, JSP, Java Eclipse, Java Beans, EJB, Servlets.
- Developed web page interfaces using JSP, Java Swings, and HTML scripting languages.
- Developed web page interfaces using JSP, Java Swings, and HTML scripting languages.
- Experience working with Spring and Hibernate frameworks for JAVA.
- Worked on various programming languages using IDEs like Eclipse, NetBeans, andIntellij.
- Excelled in using version control tools like PVCS, SVN, VSS and GIT.
- Used web-based UI development using JavaScript, jquery UI, CSS, jquery, HTML, HTML5, XHTML and JavaScript.
- Development experience in DBMS like Oracle, MS SQL Server, Teradata, and MYSQL.
- Developed stored procedures and queries using PL/SQL.
- Experience with best practices of Web services development and Integration (bothREST andSOAP).
- Experienced in using build tools like Ant, Gradle, SBT, Maven to build and deploy applications into the server.
- Knowledge in Unified Modeling Language (UML) and expertise in Object Oriented Analysis and Design (OOAD) and knowledge
- Experience in complete Software Development Life Cycle (SDLC) in both Waterfall and Agile methodologies
- Knowledge in Creating dashboards and data visualizations using Tableau to provide business insights
- Excellent communication skills, interpersonal skills, problem-solving skills and very good team player along with a can do attitude and ability to effectively communicate with all levels of the organization such as technical, management and customers.
TECHNICAL SKILLS:
Languages/Tools: Java, C, C++, C#, Scala, VB, XML, HTML/XHTML, HDML, DHTML.
Big Data: HDFS, MapReduce, HIVE, PIG, HBase, SQOOP, Oozie, Zookeeper, Spark, Mahout, Kafka, Storm, Cassandra, Solr, Impala, Greenplum, MongoDB
J2EE Standards: JDBC, JNDI, JMS, Java Mail & XML Deployment Descriptors.
Web/Distributed Technologies: J2EE,Servlets 2.1/2.2, JSP 2.0, Struts 1.1, Hibernate 3.0, JSF, JSTL1.1,EJB 1.1/2.0, RMI,JNI, XML,JAXP,XSL,XSLT, UML, MVC,STRUTS,Spring 2.0, Corba, Java Threads.
Operating System: Windows 95/98/NT/2000/XP, MS-DOS, UNIX, multiple flavors of Linux.
Databases / NO SQL: Oracle 10g, MS SQL Server 2000, DB2, MS Access & MySQL. Teradata, Cassandra, Greenplum and MongoDB
Browser Languages: HTML, XHTML, CSS, XML, XSL, XSD, XSLT.
Browser Scripting: Java script, HTML DOM, DHTML, AJAX.
App/Web Servers: IBM Websphere5.1.2/5.0/4.0/3.5, BEA Web logic 5.1/7.0, Jdeveloper, Apache Tomcat, JBoss.
GUI Environment: Swing, AWT, Applets.
Messaging & Web Services Technology: SOAP, WSDL,UDDI, XML, SOA, JAX-RPC, IBM WebSphere MQ v5.3, JMS.
Networking Protocols: HTTP, HTTPS, FTP, UDP, TCP/IP, SNMP, SMTP, POP3.
Testing &Case Tools: Junit, Log4j, Rational Clear case, CVS, ANT, Maven, JBuilder.
Version Control Systems: Git, SVN, CVS
PROFESSIONAL EXPERIENCE:
Confidential - Ohio
Sr. Big Data/Spark Developer
Responsibilities
- Involved in analyzing business requirements and prepared detailed specifications that follow project guidelines required for project development.
- Used Sqoop to import data from Relational Databases like MySQL, Oracle.
- Involved in importing structured and unstructured data into HDFS.
- Responsible for fetching real time data using Kafka and processing using Spark and Scala.
- Worked on Kafka toimport real time weblogs and ingested the data to Spark Streaming.
- Developed business logic using Kafka Direct Stream in Spark Streaming andimplemented business transformations.
- Worked on Building and implementing real-time streaming ETL pipeline using Kafka Streams API.
- Worked on Hive to implement Web Interfacing and stored the data in Hive tables.
- Migrated Map Reduce programs into Spark transformations using Spark and Scala.
- Experienced with Spark Context, Spark-SQL, Spark YARN.
- Implemented Spark Scripts using Scala, Spark SQL to accesshivetables into spark for faster processing of data.
- Loaded the data into Spark RDD and do in memory data Computation to generate the Output response.
- Implemented data quality checks using Spark Streaming and arranged passable and bad flags on the data.
- Implemented Hive Partitioning and Bucketing on the collected data in HDFS.
- Involved in Data Querying and Summarization using Hive and Pig and created UDF’s, UDAF’s and UDTF’s.
- Implemented Sqoop jobs for large data exchanges between RDBMS andHive clusters.
- Extensively used Zookeeper as a backup server and job scheduled for Spark Jobs.
- Developed traits and case classes etc in Scala.
- Developed Spark scripts using Scala shell commands as per the business requirement.
- Worked on Cloudera distribution and deployed on AWS EC2 Instances.
- Experienced in loading the real-time data to NoSQL database like Cassandra.
- Well versed in using Data Manipulations, Compactions, in Cassandra.
- Experience in retrieving the data present in Cassandra cluster by running queries in CQL (Cassandra Query Language).
- Worked on connecting Cassandra database to the Amazon EMR File System for storing the database in S3.
- Implemented usage of Amazon EMR for processing Big Data across aHadoop Clusterof virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3).
- Deployed the project on Amazon EMR with S3 connectivity for setting a backup storage.
- Well versed in using of Elastic Load Balancer for Autoscalingin EC2 servers.
- Configured work flows that involves Hadoop actions using Oozie.
- Used Python for pattern matching in build logs to format warnings and errors.
- Coordinated with SCRUM team in delivering agreed user stories on time for every sprint.
Environment: Hadoop YARN, Spark SQL, Spark-Streaming, AWS S3, AWS EMR, Spark-SQL, GraphX, Scala, Python, Kafka, Hive, Pig, Sqoop, Cassandra, Cloudera, Oracle 10g, Linux.
Confidential - Coraopolis, PA
Hadoop/Big Data Analyst
Responsibilities:
- Developed MapReduce programs to parse and filter the raw data store the refined data in partitioned tables in the Greenplum.
- Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with Greenplum reference tables and historical metrics.
- Responsible for creatingHivetables, loading the structured data resulted from MapReduce jobs into the tables and writinghivequeries to further analyze the logs to identify issues and behavioral patterns.
- Involved in running MapReduce jobs for processing millions of records.
- Built reusable Hive UDF libraries for business requirements which enabled users to use these UDF's in Hive Querying.
- Developed a data pipeline using Kafka and Storm to store data into HDFS.
- Experienced in migratingHiveQL into Impala to minimize query response time.
- Responsible for Data Modeling in Cassandra as per our requirement.
- Shared responsibility for administration of Hadoop, Hive and Pig.
- Managing and scheduling Jobs on a Hadoop cluster using Oozie and cron jobs.
- Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
- UsedElasticSearch& MongoDB for storing and querying the offers and non-offers data.
- Created UDFs to calculate the pending payment for the given Residential or Small Business customer, and used in Pig and Hive Scripts.
- Deployed and built the application usingMaven.
- Maintain Hadoop, Hadoop ecosystems, third party software, and database(s) with updates/upgrades, performance tuning and monitoring using Ambari
- Obtained good experience with NOSQL database Cassandra.
- UsedCassandraCQL with Java API's to retrieve data fromCassandratables.
- Experience in managing and reviewing Hadoop log files.
- Experienced in moving data from Hive tables intoCassandrafor real time analytics on Hive tables.
- Used Python scripting for large scale text processing utilities
- Handled importing of data from various data sources, performed transformations usingHive. (External tables, partitioning).
- Involved in NoSQL (DataStaxCassandra) database design, integration and implementation.
- Implemented CRUD operations involving lists, sets and maps in DataStaxCassandra.
- Responsible for data modeling inMongoDBin order to load data which is coming as structured as well as unstructured data.
- Unstructured files like XML's, JSON files are processed using custom built Java API and pushed intomongodb.
- Participated in development/implementation ofClouderaHadoopenvironment.
- Created tables, inserted data and executed variousCassandraQuery Language (CQL 3) commands on tables from java code and using cqlsh command line client .
- Wrote test cases in MRunitfor unit testing of Mapreduce Programs.
- Extensively worked on User Interface for few modules using JSPs, JavaScript and Ajax.
- Created Business Logic using Servlets, Session beans and deployed them on Web logic server.
- Involved in templates and screens in HTML and JavaScript.
- Developed the XML Schema and Web services for the data maintenance and structures
- Provided Technical support for production environments resolving the issues, analyzing the defects, providing and implementing the solution defects.
- Built and deployed Java applications into multiple Unix based environments and produced both unit and functional test results along with release notes.
Environment: HDFS, MapReduce, Hive, Pig, Cloudera, Impala, Oozie, Greenplum, MongoDB, Cassandra,Kafka, Storm, Maven,Python,CloudManager, NagiOS, Ambari, JDK, J2EE, Struts,JSP, Servlets, ElasticSearch, WebSphere, HTML, XML, JavaScript, MRunit.
Confidential - Peoria, IL
Hadoop/Big Data Analyst
Responsibilities:
- Worked on analyzingHadoop cluster using different big data analytic tools including Pig, Hive, and MapReduce.
- Worked on debugging, performance tuning of Hive&Pig Jobs.
- Created Hbase tables to store various data formats of PII data coming from different portfolios.
- Implemented test scripts to support test driven development and continuous integration.
- Worked on tuning the performance Pig queries.
- Involved in loading data from LINUX file system to HDFS.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Experience working on processing unstructured data using Pig and Hive.
- Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
- Gained experience in managing and reviewing Hadoop log files.
- Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs.
- Developed Pig Latin scripts to extract data from the web server output files to load into HDFS.
- Extensively used Pig for data cleansing.
- Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts.
- Implemented SQL, PL/SQL Stored Procedures.
- Actively involved in code review and bug fixing for improving the performance.
- Developed screens using JSP, DHTML, CSS, AJAX, JavaScript, Struts, Spring, Java and XML.
Environment: Hadoop, HDFS, Pig, Hive, MapReduce, Sqoop, LINUX, Cloudera, Big Data, Java APIs, Java collection, SQL, AJAX.
Confidential - SanFrancisco, CA
Java/J2EE Developer
Responsibilities:
- Involved in design and development phases of Software Development Life Cycle (SDLC).
- Involved in designing UML Use case diagrams, Class diagrams, and Sequence diagrams using Rational Rose.
- Followed agile methodology and SCRUM meetings to track, optimize and tailored features to customer needs.
- Developed user interface using JSP, JSP Tag libraries, and Java Script to simplify the complexities of the application.
- Implemented Model View Controller (MVC) architecture using Jakarta Struts 1.3 frameworks at presentation tier.
- Developed a Dojo based front end including forms and controls and programmed event handling.
- Implemented SOA architecture with web services using JAX-RS (REST) and JAX-WS (SOAP)
- Developed various Enterprise Java Bean components to fulfill the business functionality.
- Created Action Classes which route submittals to appropriate EJB components and render retrieved information.
- Validated all forms using Struts validation framework and implemented Tiles framework in the presentation layer.
- Used Core java and object oriented concepts.
- Extensively used Hibernate 3.0 in data access layer to access and update information in the database.
- Used Spring 2.0 Framework for Dependency injection and integrated it with the Struts Framework and Hibernate.
- Used JDBC to connect to backend databases, Oracle and SQL Server 2005.
- Proficient in writing SQL queries, stored procedures for multiple databases, Oracle 10g and SQL Server 2005.
- Wrote Stored Procedures using PL/SQL. Performed query optimization to achieve faster indexing and making the system more scalable. .
- Used Java Messaging Services (JMS) for reliable and asynchronous exchange of important information such as payment status report.
- Used Web Services - WSDL and REST for getting credit card information from third party and used SAX and DOM XML parsers for data retrieval.
- Implemented SOA architecture with web services using Web Services like JAX-WS.
- Extensively used IBM RAD 7.0 for writing code.
- Implemented Persistence layer using Hibernate to interact with Oracle 10g and SQL Server 2005 databases.
- Used ANT scripts to build the application and deployed on Web Sphere Application Server.
Environment: Core Java, J2EE, Web Logic 9.2, Oracle 10g, SQL Server, JSP, STRUTS, JDK, JSF,JAX-RS (REST), JAX-WS (SOAP), JMS, Hibernate, JavaScript, HTML, CSS, IBM RAD 7.0,AJAX, JSTL, ANT1.7 build tool, Junit, Spring, Log4j, Web Services.
Confidential
Java Developer
Responsibilities:
- Extensively involved in the design and development of JSP screens to suit specific modules.
- Converted the application’s console printing of process information to proper logging technology using log4j.
- Developed the business components (in core Java) used in the JSP screens.
- Involved in the implementation of logical and physical database design by creating suitable tables,views and triggers.
- Developed related procedures and functions used by JDBC calls in the above components.
- Extensively involved in performance tuning of Oracle queries.
- Created components to extract application messages stored in xml files.
- Executed UNIX shell scripts for command line administrative access to oracle database and for scheduling backup jobs.
- Created war files and deployed in web server.
- Performed source and version control using VSS.
- Involved in maintenance support.
Environment: JDK, HTML, JavaScript, XML, JSP, Servlets, JDBC, Oracle 9i, Eclipse, Toad, UNIX Shell Scripting, MS Visual SourceSafe, Windows 2000.
Confidential
Junior JAVA Developer
Responsibilities:
- Involved in the analysis, design, implementation, and testing of the project.
- Implemented the presentation layer with HTML, XHTML and JavaScript.
- Developed web components using JSP, Servlets and JDBC.
- Designed tables and indexes.
- Extensively worked on JUnit for testing the application code of server-client data transferring.
- Developed and enhanced products in design and in alignment with business objectives.
- Used SVN as a repository for managing/deploying application code.
- Involved in the system integration and user acceptance tests successfully.
- Developed front end using JSTL, JSP, HTML, and Java Script.
- Wrote complex SQL queries and stored procedures.
- Involved in fixing bugs and unit testing with test cases using JUnit.
- Actively involved in the system testing.
- Involved in implementing service layer using Spring IOC module.
- Prepared the Installation, Customer guide and Configuration document which were delivered to the customer along with the product.
Environment: Java, JSP, JSTL, HTML, JAVAScript, Servlets, JDBC, JavaScript, MySQL, JUnit, Eclipse IDE.
