Big Data Spark Developer Resume Bloomington, IL - Hire IT People

SUMMARY:

Around 8+ years of technical IT experience in all phases of Software Development Life Cycle (SDLC) with skills in data analysis, design, development, testing and deployment of software systems.
Designed and implemented data ingestion techniques for data coming from various data sources.
Hands on experience in Hadoop Ecosystem components such as Spark, Hive, Pig, Sqoop, Flume, Zookeeper/Kafka, Hbase and MapReduce.
Extensively understanding of Hadoop Architecture, workload management, schedulers, scalability and various components such as YARN, Map Reduce, Strong SQL programming, Hiveql for Hive, Pig and Hbase.
Experience in importing and exporting data into HDFS and Hive using Sqoop.
Experience in converting SQL queries into Spark Transformations using Spark RDDs, Scala and Performed map - side joins on RDD's.
Experience in working with Teradata and making the data to be batch processing using distributed computing.
Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems/ Non-Relational Database Systems and vice-versa.
Experience with distributed systems, large-scale non-relational data stores, MapReduce systems, data modeling, and big data systems.
Hands on experience in creating data pipe line using Kafka, flume and Storm for Security Fraud and Compliance Violations use cases.
Good knowledge in using apache NiFi to automate the data movement between different Hadoop systems.
Experienced in loading data to hive partitions and creating buckets in Hive.
Experienced with performing analytics on Time Series data using HBase and Java API.
Ability to tune Big Data solutions to improve performance and end-user experience.
Good understanding of cloud configuration in Amazon web services (AWS) and Azure.
Hands on experience on AWS infrastructure services Amazon Simple Storage Service (Amazon S3) and Amazon Elastic Compute Cloud (Amazon EC2) and Microsoft Azure.
Involved in creating Hive tables, loading with data and writing Hive Adhoc queries that will run internally in MapReduce and TEZ.
Replaced existing MR jobs and Hive scripts with Spark SQL & Spark data transformations for efficient data processing.
Experience developing Kafka producers and Kafka Consumers for streaming millions of events per second on streaming data
Strong understanding of real time streaming technologies Spark and Kafka.
Knowledge of job work flow management and coordinating tools like Oozie.
Strong experience building end to end data pipelines on Hadoop platform.
Worked in Agile methodology and used JIRA for Development and tracking the project.
A good development experience with Agile Methodology, SDLC and Water fall methodology.
Involved in Agile methodologies, daily scrum meetings, spring planning.
Experience working with NoSQL database technologies, including MongoDB, Cassandra and HBase.
Hands on Experience in VPN, Putty, winSCP etc. Responsible for creating Hive tables based on business requirements.
Strong understanding of Logical and Physical data base models and entity-relationship modeling.
Managed multiple tasks and worked under tight deadlines and in fast pace environment.
Worked on multiple stages of Software Development Life Cycle including Development, Component Integration, Performance Testing, Deployment and Support Maintenance.
Strong communication skills, analytic skills, good team player and quick learner, organized and self-motivated.

TECHNICAL SKILLS:

Big Data Ecosystem: HDFS, MapReduce, HBase, Pig, Hive, Sqoop, Flume, Cassandra, Impala, Oozie, Zookeeper, MapR, Amazon Web Services (AWS), EMR

Java & J2EE Technologies: Core Java, Servlets, JSP, JDBC, Java Beans

IDE s: IntelliJ, Eclipse, Spyder, Jupyter

Operating Systems: Windows, Linux

Programming languages: Python, Scala, Linux shell scripts, ColdFusion, PL/SQL, C, C++, Java

Databases: Oracle 11g/10g/9i, MySQL, DB2, MS-SQL Server, HBASE

Web Servers: Web Logic, Web Sphere, Apache Tomcat

Web Technologies: HTML, XML, JavaScript, AJAX

Network Protocols: TCP/IP, UDP, HTTP, DNS, DHCP

Business Tools: We Intelligence, Crystal Reports, Dashboard Design, WebI Rich Client

PROFESSIONAL EXPERIENCE:

Confidential - Bloomington, IL

Big Data Spark Developer

Responsibilities:

Involved in complete Bigdata flow of the application starting from data ingestion from upstream to HDFS, processing and analyzing the data in HDFS.
Created Partitioned and Bucketed Hive tables in Parquet File Formats with Snappy compression and then loaded data into Parquet hive tables from Avro hive table.
Developed Spark API to import data into HDFS from Teradata and create Hive tables.
Involved in running all the hive scripts through hive, Impala, Hive on Spark and some through Spark SQL.
Involved in performance tuning of Hive from design, storage and query perspectives.
Involved in designing and developing tables in HBase and storing aggregated data from Hive Table.
Developed Sqoop jobs to import data in Avro file format from Oracle database to HDFS and created hive tables on top of it.
Experience on AWS-EMR, Spark Installation, HDFS & Map Reduce Architecture.
Good knowledge on Spark, Scala and Hadoop distributions like Apache Hadoop, Cloudera.
Good experience in All Hadoop and Spark ecosystems which includes Hive, Pig, Sqoop, Kafka, Cassandra, Spark SQL, Spark Streaming and Flink.
Developed Flume ETL job for handling data from HTTP Source and Sink as HDFS.
Collected the Json data from HTTP Source and developed Spark APIs that helps to do inserts and updates in Hive tables.
Developed Spark scripts to import large files from Amazon S3 buckets.
Developed Spark core and Spark SQL scripts using Scala for faster data processing.
Developed Kafka consumer API in Scala for consuming data from Kafka topics.
Experience in Developing Spark jobs using Scala in test environment for faster real time analytics and used Spark SQL for querying.
Performed hands-on data manipulation, transformation and predictive modeling.
Developed Oozie workflows to ingest/parse the raw data, populate staging tables and store the refined data in partitioned tables in the Hive.
Used Map Reduce and Sqoop to load, aggregate, store and analyze web log data from different web servers.
Responsible for defining, developing and communicating key metrics and business trends to partner and management teams.
Experience in designing and developing Spark applications using Scala.
Experience in scheduling, distributing and monitoring jobs using Spark core.
Used Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala and databases such as HBase.

Environment: Hadoop, Map Reduce, HDFS, Hive, Cassandra, Sqoop, Oozie, SQL, Kafka, Spark, Scala, Java, AWS, GitHub, Talend Big Data Integration, Solr, Impala.

Confidential - Chevy Chase, MD

Spark Developer

Responsibilities:

Writing the BRD and Technical Design documents for Data ingestion.
Creating and loading the Hive tables and scheduling the data ingestion jobs.
Transforming the data received from source systems using python and creating the files to load into hive.
Transforming the data to make it available for analytics jobs.
Developing the Sqoop Jobs.
Contribute to best practice for data extraction, integration and analysis.
Compile competitive information and external benchmarking data for Development.
Structured analysis of portfolio, making recommendations to maximize value creation within budget and resourcing constraints.
Used Spark and Spark-SQL to read the parquet data and create the tables in hive using the Scala API.
The Hive tables are created as per requirement were Internal or External tables defined with appropriate static, dynamic partitions and bucketing, intended for efficiency.
Estimated the hardware requirements for NameNode and DataNodes & planning the cluster.
Developed framework to import the data from database to HDFS using Sqoop. Developed HQLs to extract data from Hive tables for reporting.
Hands on experience in writing MR jobs for cleansing the data and to copy it to AWS cluster form our cluster
Used open source web scraping framework for python to crawl and extract data from web pages.
Moved Relational Database data using Sqoop into Hive Dynamic partition tables using staging tables.
Worked on creating Hive tables and written Hive queries for data analysis to meet business requirements and experienced in Sqoop to import and export the data from Oracle & MySQL.
Developed Hive queries for the analysts by loading and transforming large sets of structured, semi structured data using hive.
Worked on Spark SQL for joining multi hive tables and write them to a final hive table and stored them on S3.
Performed querying of both managed and external tables created by Hive using Impala.
Utilized Apache Hadoop environment by Cloudera Distribution.

Environment: Hadoop 2, HDFS, Spark 2.2, Scala, Java, Kafka, Hive, HiveQL, Oozie, Sqoop, Impala, Tradmill, Git, HBase.

Confidential - St. Louis, MO

Hadoop Developer

Responsibilities:

Installed and configured Hadoop Map-Reduce, HDFS and developed multiple Map-Reduce jobs in Java for data cleansing and preprocessing.
Designed and developed database objects like Tables, Views, Stored Procedures, User Functions using PL/SQL, SQL Developer and used them in WEB components.
Implemented Spark using python and Spark SQL for faster processing of data and algorithms for real time analysis in Spark.
Load and transform large sets of structured, semi structured and unstructured data.
Supported Map-Reduce Programs those are running on the cluster.
Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
Involved in architecture and design of distributed time-series database platform using NOSQL technologies like Hadoop / Hbase, Zookeeper.
Involved in creating Hive tables and working on them using HiveQL and perform data analysis using Hive and Pig.
Involved in creating Hive tables, loading with data and writing hive queries, which will run internally in map, reduce way.
Responsible for building scalable distributed data solutions using Hadoop.
Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume.
Experience in managing and reviewing Hadoop log files.
Worked on Apache Flume for collecting and aggregating huge amount of log data and stored it on HDFS for doing further analysis.
Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
Managed real time data processing and real time Data Ingestion in MongoDB and Hive using Storm.
Developed data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
Imported data from Teradata to HDFS thru Informatica maps using Unix scripts.

Environment: Hadoop, Kafka, Spark, Sqoop, Spark SQL, Spark-Streaming, Hive, Scala, pig, NoSQL, Impala, Oozie, Hbase, Zookeeper.

Confidential

Hadoop Developer

Responsibilities:

Written Hive queries to transform the data into tabular format and process the results using Hive Query Language.
Created Pig Latin scripts to sort, group, join and filter the enterprise wise data.
Analyzed the functional specifications.
Implemented PIG scripts According business rules.
Implemented Hive tables and HQL Queries for the reports.
Analyzed business requirements and cross-verified with functionality and features of NOSQL databases like HBase, Cassandra to determine the optimal DB.
Participated in SOLR schema, and ingested data into SOLR for data indexing.
Written MapReduce programs to organize the data, and ingest the data to suitable for analytics in client specified format
Hands on experience in writing python scripts to optimize the performance Implemented Storm builder topologies to perform cleansing operations before moving data into Cassandra.
Extracted files from Cassandra through Sqoop and placed in HDFS and processed. Implemented Bloom filters in Cassandra using keyspace creation
Supports and assist QA Engineers in understanding, testing and troubleshooting.
Written build scripts using ant and participated in the deployment of one or more production systems
Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
Worked on tuning the performance of MapReduce Jobs.
Responsible to manage data coming from different sources.
Load and transform large sets of structured, semi structured and unstructured data.
Experience in managing and reviewing Hadoop log files.
Developed Python/Django application for Analytics aggregation and reporting.
Used Django configuration to manage URLs and application parameters.
Generated Python Django Forms to record data of online users.
Used Python and Django creating graphics, XML processing, data exchange and business logic.
Created Oozie workflows to run multiple MR, Hive and pig jobs.

Environment: Cloudera, Hadoop, Pig, Sqoop, Python, Hive, HBase, Java, Eclipse, MySQL, MapReduce, Hcatalog.

Confidential

Java Developer

Responsibilities:

Designed use case diagrams, class diagrams and sequence diagrams using Microsoft Visio tool.
Extensively used Spring IOC, Hibernate, Core Java such as Exceptions, Collections, etc.
Deployed the applications on IBM Web Sphere Application Server.
Utilized various utilities like Struts Tag Libraries, JSP, JavaScript, HTML, & CSS.
Build and deployed war file in WebSphere application server.
Implemented Patterns such as Singleton, Factory, Facade, Prototype, Decorator, Business Delegate and MVC.
Involved in frequent meeting with clients to gather business requirement & converting them to technical specification for development team.
Involved in Bug fixing and Enhancement phase, used find bug tool.
Version Controlled using SVN.
Developed application in Eclipse IDE.
Primarily involved in front-end UI using HTML5, CSS3, JavaScript, jQuery, and AJAX.
Used struts framework to build MVC architecture and separate presentation from business logic.
Involved in rewriting middle-tier on WebLogic application server.
Developed the administrative UI using Angular.js and Ext JS.
Generated Stored Procedures using PL/SQL language.

Environment: JDK1.5, JSP, Servlet, EJB, Spring, JavaScript, Hibernate, JQuery, Struts, Design Patterns, HTML, CSS, JMS, XML, Apache, Oracle ECM, Struts, Webservice, SOAP.

Confidential

Java Developer

Responsibilities:

Primarily involved in front-end UI using HTML5, CSS3, JavaScript, jQuery, and AJAX.
Used struts framework to build MVC architecture and separate presentation from business logic.
Generated Stored Procedures using PL/SQL language.
Designed the database tables using normalization concepts & implemented cascading delete relationships between different transaction tables.
Used XSLT for transforming the XML documents in to HTML documents.
Used various design patterns like façade pattern, service delegate, factory pattern, singleton pattern, DAO etc.
Involve in support of the application which involved defect fixing and minor enhancements.

Environment: Core Java, Spring Framework, SOAP Web services, Oracle 11g application Server, JUnit, DAO, SOAP UI, Eclipse IDE, JAX-RPC, SVN, XML.

We provide IT Staff Augmentation Services!

Big Data Spark Developer Resume

Bloomington, IL

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship