Big Data Spark Developer Resume
Bloomington, IL
SUMMARY:
- Around 8+ years of technical IT experience in all phases of Software Development Life Cycle (SDLC) with skills in data analysis, design, development, testing and deployment of software systems.
- Designed and implemented data ingestion techniques for data coming from various data sources.
- Hands on experience in Hadoop Ecosystem components such as Spark, Hive, Pig, Sqoop, Flume, Zookeeper/Kafka, Hbase and MapReduce.
- Extensively understanding of Hadoop Architecture, workload management, schedulers, scalability and various components such as YARN, Map Reduce, Strong SQL programming, Hiveql for Hive, Pig and Hbase.
- Experience in importing and exporting data into HDFS and Hive using Sqoop.
- Experience in converting SQL queries into Spark Transformations using Spark RDDs, Scala and Performed map - side joins on RDD's.
- Experience in working with Teradata and making the data to be batch processing using distributed computing.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems/ Non-Relational Database Systems and vice-versa.
- Experience with distributed systems, large-scale non-relational data stores, MapReduce systems, data modeling, and big data systems.
- Hands on experience in creating data pipe line using Kafka, flume and Storm for Security Fraud and Compliance Violations use cases.
- Good knowledge in using apache NiFi to automate the data movement between different Hadoop systems.
- Experienced in loading data to hive partitions and creating buckets in Hive.
- Experienced with performing analytics on Time Series data using HBase and Java API.
- Ability to tune Big Data solutions to improve performance and end-user experience.
- Good understanding of cloud configuration in Amazon web services (AWS) and Azure.
- Hands on experience on AWS infrastructure services Amazon Simple Storage Service (Amazon S3) and Amazon Elastic Compute Cloud (Amazon EC2) and Microsoft Azure.
- Involved in creating Hive tables, loading with data and writing Hive Adhoc queries that will run internally in MapReduce and TEZ.
- Replaced existing MR jobs and Hive scripts with Spark SQL & Spark data transformations for efficient data processing.
- Experience developing Kafka producers and Kafka Consumers for streaming millions of events per second on streaming data
- Strong understanding of real time streaming technologies Spark and Kafka.
- Knowledge of job work flow management and coordinating tools like Oozie.
- Strong experience building end to end data pipelines on Hadoop platform.
- Worked in Agile methodology and used JIRA for Development and tracking the project.
- A good development experience with Agile Methodology, SDLC and Water fall methodology.
- Involved in Agile methodologies, daily scrum meetings, spring planning.
- Experience working with NoSQL database technologies, including MongoDB, Cassandra and HBase.
- Hands on Experience in VPN, Putty, winSCP etc. Responsible for creating Hive tables based on business requirements.
- Strong understanding of Logical and Physical data base models and entity-relationship modeling.
- Managed multiple tasks and worked under tight deadlines and in fast pace environment.
- Worked on multiple stages of Software Development Life Cycle including Development, Component Integration, Performance Testing, Deployment and Support Maintenance.
- Strong communication skills, analytic skills, good team player and quick learner, organized and self-motivated.
TECHNICAL SKILLS:
Big Data Ecosystem: HDFS, MapReduce, HBase, Pig, Hive, Sqoop, Flume, Cassandra, Impala, Oozie, Zookeeper, MapR, Amazon Web Services (AWS), EMR
Java & J2EE Technologies: Core Java, Servlets, JSP, JDBC, Java Beans
IDE s: IntelliJ, Eclipse, Spyder, Jupyter
Operating Systems: Windows, Linux
Programming languages: Python, Scala, Linux shell scripts, ColdFusion, PL/SQL, C, C++, Java
Databases: Oracle 11g/10g/9i, MySQL, DB2, MS-SQL Server, HBASE
Web Servers: Web Logic, Web Sphere, Apache Tomcat
Web Technologies: HTML, XML, JavaScript, AJAX
Network Protocols: TCP/IP, UDP, HTTP, DNS, DHCP
Business Tools: We Intelligence, Crystal Reports, Dashboard Design, WebI Rich Client
PROFESSIONAL EXPERIENCE:
Confidential - Bloomington, IL
Big Data Spark Developer
Responsibilities:
- Involved in complete Bigdata flow of the application starting from data ingestion from upstream to HDFS, processing and analyzing the data in HDFS.
- Created Partitioned and Bucketed Hive tables in Parquet File Formats with Snappy compression and then loaded data into Parquet hive tables from Avro hive table.
- Developed Spark API to import data into HDFS from Teradata and create Hive tables.
- Involved in running all the hive scripts through hive, Impala, Hive on Spark and some through Spark SQL.
- Involved in performance tuning of Hive from design, storage and query perspectives.
- Involved in designing and developing tables in HBase and storing aggregated data from Hive Table.
- Developed Sqoop jobs to import data in Avro file format from Oracle database to HDFS and created hive tables on top of it.
- Experience on AWS-EMR, Spark Installation, HDFS & Map Reduce Architecture.
- Good knowledge on Spark, Scala and Hadoop distributions like Apache Hadoop, Cloudera.
- Good experience in All Hadoop and Spark ecosystems which includes Hive, Pig, Sqoop, Kafka, Cassandra, Spark SQL, Spark Streaming and Flink.
- Developed Flume ETL job for handling data from HTTP Source and Sink as HDFS.
- Collected the Json data from HTTP Source and developed Spark APIs that helps to do inserts and updates in Hive tables.
- Developed Spark scripts to import large files from Amazon S3 buckets.
- Developed Spark core and Spark SQL scripts using Scala for faster data processing.
- Developed Kafka consumer API in Scala for consuming data from Kafka topics.
- Experience in Developing Spark jobs using Scala in test environment for faster real time analytics and used Spark SQL for querying.
- Performed hands-on data manipulation, transformation and predictive modeling.
- Developed Oozie workflows to ingest/parse the raw data, populate staging tables and store the refined data in partitioned tables in the Hive.
- Used Map Reduce and Sqoop to load, aggregate, store and analyze web log data from different web servers.
- Responsible for defining, developing and communicating key metrics and business trends to partner and management teams.
- Experience in designing and developing Spark applications using Scala.
- Experience in scheduling, distributing and monitoring jobs using Spark core.
- Used Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala and databases such as HBase.
Environment: Hadoop, Map Reduce, HDFS, Hive, Cassandra, Sqoop, Oozie, SQL, Kafka, Spark, Scala, Java, AWS, GitHub, Talend Big Data Integration, Solr, Impala.
Confidential - Chevy Chase, MD
Spark Developer
Responsibilities:
- Writing the BRD and Technical Design documents for Data ingestion.
- Creating and loading the Hive tables and scheduling the data ingestion jobs.
- Transforming the data received from source systems using python and creating the files to load into hive.
- Transforming the data to make it available for analytics jobs.
- Developing the Sqoop Jobs.
- Contribute to best practice for data extraction, integration and analysis.
- Compile competitive information and external benchmarking data for Development.
- Structured analysis of portfolio, making recommendations to maximize value creation within budget and resourcing constraints.
- Used Spark and Spark-SQL to read the parquet data and create the tables in hive using the Scala API.
- The Hive tables are created as per requirement were Internal or External tables defined with appropriate static, dynamic partitions and bucketing, intended for efficiency.
- Estimated the hardware requirements for NameNode and DataNodes & planning the cluster.
- Developed framework to import the data from database to HDFS using Sqoop. Developed HQLs to extract data from Hive tables for reporting.
- Hands on experience in writing MR jobs for cleansing the data and to copy it to AWS cluster form our cluster
- Used open source web scraping framework for python to crawl and extract data from web pages.
- Moved Relational Database data using Sqoop into Hive Dynamic partition tables using staging tables.
- Worked on creating Hive tables and written Hive queries for data analysis to meet business requirements and experienced in Sqoop to import and export the data from Oracle & MySQL.
- Developed Hive queries for the analysts by loading and transforming large sets of structured, semi structured data using hive.
- Worked on Spark SQL for joining multi hive tables and write them to a final hive table and stored them on S3.
- Performed querying of both managed and external tables created by Hive using Impala.
- Utilized Apache Hadoop environment by Cloudera Distribution.
Environment: Hadoop 2, HDFS, Spark 2.2, Scala, Java, Kafka, Hive, HiveQL, Oozie, Sqoop, Impala, Tradmill, Git, HBase.
Confidential - St. Louis, MO
Hadoop Developer
Responsibilities:
- Installed and configured Hadoop Map-Reduce, HDFS and developed multiple Map-Reduce jobs in Java for data cleansing and preprocessing.
- Designed and developed database objects like Tables, Views, Stored Procedures, User Functions using PL/SQL, SQL Developer and used them in WEB components.
- Implemented Spark using python and Spark SQL for faster processing of data and algorithms for real time analysis in Spark.
- Load and transform large sets of structured, semi structured and unstructured data.
- Supported Map-Reduce Programs those are running on the cluster.
- Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
- Involved in architecture and design of distributed time-series database platform using NOSQL technologies like Hadoop / Hbase, Zookeeper.
- Involved in creating Hive tables and working on them using HiveQL and perform data analysis using Hive and Pig.
- Involved in creating Hive tables, loading with data and writing hive queries, which will run internally in map, reduce way.
- Responsible for building scalable distributed data solutions using Hadoop.
- Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume.
- Experience in managing and reviewing Hadoop log files.
- Worked on Apache Flume for collecting and aggregating huge amount of log data and stored it on HDFS for doing further analysis.
- Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
- Managed real time data processing and real time Data Ingestion in MongoDB and Hive using Storm.
- Developed data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
- Imported data from Teradata to HDFS thru Informatica maps using Unix scripts.
Environment: Hadoop, Kafka, Spark, Sqoop, Spark SQL, Spark-Streaming, Hive, Scala, pig, NoSQL, Impala, Oozie, Hbase, Zookeeper.
Confidential
Hadoop Developer
Responsibilities:
- Written Hive queries to transform the data into tabular format and process the results using Hive Query Language.
- Created Pig Latin scripts to sort, group, join and filter the enterprise wise data.
- Analyzed the functional specifications.
- Implemented PIG scripts According business rules.
- Implemented Hive tables and HQL Queries for the reports.
- Analyzed business requirements and cross-verified with functionality and features of NOSQL databases like HBase, Cassandra to determine the optimal DB.
- Participated in SOLR schema, and ingested data into SOLR for data indexing.
- Written MapReduce programs to organize the data, and ingest the data to suitable for analytics in client specified format
- Hands on experience in writing python scripts to optimize the performance Implemented Storm builder topologies to perform cleansing operations before moving data into Cassandra.
- Extracted files from Cassandra through Sqoop and placed in HDFS and processed. Implemented Bloom filters in Cassandra using keyspace creation
- Supports and assist QA Engineers in understanding, testing and troubleshooting.
- Written build scripts using ant and participated in the deployment of one or more production systems
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Worked on tuning the performance of MapReduce Jobs.
- Responsible to manage data coming from different sources.
- Load and transform large sets of structured, semi structured and unstructured data.
- Experience in managing and reviewing Hadoop log files.
- Developed Python/Django application for Analytics aggregation and reporting.
- Used Django configuration to manage URLs and application parameters.
- Generated Python Django Forms to record data of online users.
- Used Python and Django creating graphics, XML processing, data exchange and business logic.
- Created Oozie workflows to run multiple MR, Hive and pig jobs.
Environment: Cloudera, Hadoop, Pig, Sqoop, Python, Hive, HBase, Java, Eclipse, MySQL, MapReduce, Hcatalog.
Confidential
Java Developer
Responsibilities:
- Designed use case diagrams, class diagrams and sequence diagrams using Microsoft Visio tool.
- Extensively used Spring IOC, Hibernate, Core Java such as Exceptions, Collections, etc.
- Deployed the applications on IBM Web Sphere Application Server.
- Utilized various utilities like Struts Tag Libraries, JSP, JavaScript, HTML, & CSS.
- Build and deployed war file in WebSphere application server.
- Implemented Patterns such as Singleton, Factory, Facade, Prototype, Decorator, Business Delegate and MVC.
- Involved in frequent meeting with clients to gather business requirement & converting them to technical specification for development team.
- Involved in Bug fixing and Enhancement phase, used find bug tool.
- Version Controlled using SVN.
- Developed application in Eclipse IDE.
- Primarily involved in front-end UI using HTML5, CSS3, JavaScript, jQuery, and AJAX.
- Used struts framework to build MVC architecture and separate presentation from business logic.
- Involved in rewriting middle-tier on WebLogic application server.
- Developed the administrative UI using Angular.js and Ext JS.
- Generated Stored Procedures using PL/SQL language.
Environment: JDK1.5, JSP, Servlet, EJB, Spring, JavaScript, Hibernate, JQuery, Struts, Design Patterns, HTML, CSS, JMS, XML, Apache, Oracle ECM, Struts, Webservice, SOAP.
Confidential
Java Developer
Responsibilities:
- Primarily involved in front-end UI using HTML5, CSS3, JavaScript, jQuery, and AJAX.
- Used struts framework to build MVC architecture and separate presentation from business logic.
- Generated Stored Procedures using PL/SQL language.
- Designed the database tables using normalization concepts & implemented cascading delete relationships between different transaction tables.
- Used XSLT for transforming the XML documents in to HTML documents.
- Used various design patterns like façade pattern, service delegate, factory pattern, singleton pattern, DAO etc.
- Involve in support of the application which involved defect fixing and minor enhancements.
Environment: Core Java, Spring Framework, SOAP Web services, Oracle 11g application Server, JUnit, DAO, SOAP UI, Eclipse IDE, JAX-RPC, SVN, XML.